Queenside Castling on Cardano :  Bitwise Operations for Better Solutions and Improved Plutus Scripts

Introduction

Currently, MLabs is undertaking the work of bringing bitwise primitive operations to Plutus Core, specifically CIP-122 and CIP-123. As part of this work, we were asked to provide some benchmarks demonstrating how such primitive operations can be applied. We decided that one of these should be a backtracking solver for the 8-queens problem. While designing and implementing this, we came across an interesting solution that we thought would be fun to share and demonstrate.

In this article, we will describe two solutions to the n-queens problem, both using operations over a bit-oriented structure. Our first solution will be intuitive, while our second will demonstrate how we can improve on our first attempt, both in theory and in practice. We will also cover some interesting topics relating to bitwise operations, mutable operations, and how we can represent bit-packed structures, specifically bitvectors.

No knowledge of Plutus or Plutus Core is required to enjoy this article. We refer to some intermediate Haskell concepts (most notably the use of ST and mutability), as well as some light asymptotic analysis, but our goal is to make this article accessible to all Haskellers.

The problem

As the title suggests, we're going to talk about the venerable n-queens problem. To recap, the problem statement is this:

Given a dimension n, give n sets of row-column coordinates on an n by n chessboard such that, if a queen is placed at each such position, no two queens would threaten each other.

This problem has been used as a puzzle, as well as a demonstration of backtracking performance, for a considerable time[1]. While completely analytical solutions exist, which technically make backtracking unnecessary, we will work on a backtracking-oriented solution here.

Furthermore, we're going to try and make use of a bitwise approach, specifically making use of (a flavour of) rank-select dictionary to help us. More specifically, we will be using such a structure to keep track of our search state while demonstrating (both theoretically and practically) how this can be beneficial. To keep the presentation manageable (as rank-select dictionaries are quite a deep topic), we will use a simplified presentation of the relevant operations here.

Basics

Before we present any solutions, we cover a few basic ideas that we will use throughout this article. We assume all indexing is zero-based. Furthermore, to make the presentation less fiddly, we will assume that any n we are given is an exact multiple of 8[2].

Backtracking and n-queens

We make the following observation:

Any valid n-queens solution will use each row and column exactly once.

To see why this is the case, we first note that if a row or column is not used, because we have to place n queens, we would have to use a different row or column more than once. Furthermore, if we placed two (or more) queens in the same row or column, our solution would immediately fail, as they'd threaten each other.

Based on this observation, we can follow a row-by-row approach that does something similar to this:

  1. Try placing a queen somewhere in the current row; if you can't, backtrack.

  2. Mark all positions the queen from 1 would threaten in the rows below.

  3. Repeat for the row below; if you get backtracked to, unmark, then try another position. If you can't find one, backtrack again.

Vital to this approach is the ability to track the state of our search. Specifically, we need to know:

  • What choices of placement we have at the current row;

  • What positions are 'marked'; and

  • What choice ordinal (first, second, etc) we're up to on the current row

This information must be modifiable but also restorable because every time we backtrack we may need to 'undo' any modifications potentially all the way back to the start. We also want to use as little space as possible to represent this information: more space used will mean more work for us every time we backtrack or commit.

Bitwise

Throughout, we will use a bitvector data structure. We can think of a bitvector as an array of individual bits packed as tightly together as possible. Such structures have a range of applications, but in our case, we will use them as extremely compact arrays of Bools. There are numerous reasons why this is beneficial: not only is the required memory pretty much as low as possible (namely, bitvectors are implicit), it is also packed densely in memory making it cache-friendly and efficient.

There are a range of data types provided to us by Haskell libraries that could be used to implement a bitvector type:

  • Strict ByteString from bytestring

  • ByteArray from primitive (or its corresponding unboxed variant from base)

  • UArray i w from array, where w is any type with a FiniteBits instance

  • Unboxed Vectors from vector of any type with a FiniteBits instance (or of Bit from bitvec)

  • Storable versions of all the above

  • massiv array versions of all the above

We will use ByteArray for our description, for three reasons: firstly, it is now available in base (although its operations still reside in primitive for now); secondly, it has a small and simple API, which helps our presentation; and thirdly, it has a mutable counterpart in MutableByteArray which we can use in ST, which is clearer for what we want to do.

Defining this type is straightforward:

newtype Bitvector = Bitvector ByteArray

The next choice we have to make is the bit indexing scheme. This is because ByteArray is oriented around byte accesses, not bit accesses, and we have several choices in how we can access individual bits in a packed, byte-oriented structure. We use a 'right-to-left' indexing scheme; namely, the bit at position 0 is the rightmost (least-significant bit in the last byte), and indexes proceed to the right (increasing significance within bytes, reducing-indexing across them). For an example, consider the Bitvector whose representation is a length 2 ByteArray with (byte) index 0 being 0x8F and (byte) index 1 being 0x37. Then, its bit indexes will look as follows:

Bit indexes  | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
Bit values   | 1  | 0  | 0  | 0  | 1  | 1  | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
Byte indexes |                 0                   |               1               |

We make this choice primarily because in this setting, bit indexing within a Bitvector agrees with bit indexing for bytes (by way of the Bits type class), as well as because it means consecutive bit indexes within a byte are consistent with consecutive bit indexes across bytes. This will make certain operations easier to define. Unfortunately, this choice forces us to 'run backwards' as compared to byte indexes, but since we don't need to access whole bytes of a Bitvector as part of our required API, we won't worry about it.

We define one construction operation for Bitvectors, which produces a Bitvector storing n bits, all of which are clear. Due to our restriction to exact multiples of 8 for n-queens dimensions, we won't worry about being given a length that's not a multiple of 8.

zeroes :: Int -> Bitvector
zeroes n = runST $ do
  let !len = len `quot` 8
  result <- BA.newByteArray len
  BA.fillByteArray result 0 len 0x00
  Bitvector <$> BA.unsafeFreezeByteArray result

We will use this pattern of mutable allocation, followed by mutable operations, followed by freezing throughout this article. We do this inside of ST which as this allows us to minimize the amount of copying we need to do as well as giving us control over when any copying occurs. This is convenient with MutableByteArray which works in any PrimMonad which is a capability-based type class indicating that we have access to mutable state. The processing involves writing a zero byte to every position.

'Freezing' in this context refers to converting from a mutable to an immutable representation of the same data. There is a reverse counterpart to this operation called 'thawing' that converts an immutable representation to a mutable one. Each comes in two variants: a 'safe' one (which copies the memory before converting it), and an 'unsafe' one (which doesn't). Which to use depends on whether the memory being operated on could be aliased (namely, someone other than our function would be able to see its changes). In this case, we use the 'unsafe' freeze operation, as the memory is newly allocated. Nobody else can have a reference to it.

One key operation we will need is the ability to set bits at an index (that is, make the bit at that position into a 1). Because Bitvector is an immutable type, we don't modify it in place: instead, we return a new copy with the change applied.

setBit :: Bitvector -> Int -> Bitvector
setBit (Bitvector ba) ix = runST $ do
  let !len = BA.sizeofByteArray ba
  thawed <- BA.thawByteArray ba 0 len
  let !bigIx = len - (ix `quot` 8) - 1
  let !smallIx = ix `rem` 8
  !originalByte :: Word8 <- BA.readByteArray thawed bigIx
  let !newByte = originalByte .|. bit smallIx
  BA.writeByteArray thawed bigIx newByte
  Bitvector <$> BA.unsafeFreezeByteArray thawed

This operation is similar in structure to zeroes, except instead of allocating new memory, we use a 'safe' thaw to first copy our input Bitvector, then mark that copy as mutable. This is exactly what we want, as then the original Bitvector we passed as an argument won't be modified. To set the bit at the given index, we do the following:

  1. Split the index into a 'byte part' (bigIx) and a 'bit part' (smallIx). The bigIx will need adjustment as it will be 'from the end'.

  2. Read the byte at bigIx from the original Bitvector.

  3. Use a bitwise OR, combined with a single-bit mask based on smallIx, to set the required bit in the byte read at 2.

  4. Write back the result from 3 to the mutable copy of our original Bitvector at bigIx.

This operation requires a copy of the entire input Bitvector. We may sometimes want to set many bits in a row, which means an operation like this would make sense to have:

setBits :: Bitvector -> [Int] -> Bitvector

We could implement it as a fold:

setBits = foldl' setBit

However, this would be extremely inefficient: for a bitvector of length n, and a list of m changes, we would have to copy the bitvector m times (for a total of θ(mn) work), only to throw all but the last copy away. To avoid this, we instead factor out the mutable bit setting code and define both setBit and setBits relative that helper:

setBit :: Bitvector -> Int -> Bitvector
setBit (Bitvector ba) ix = runST $ do
  let !len = BA.sizeofByteArray ba
  thawed <- BA.thawByteArray ba 0 len
  setBit' thawed len ix
  Bitvector <$> BA.unsafeFreezeByteArray thawed

setBits :: Bitvector -> [Int] -> Bitvector
setBits (Bitvector ba) ixes = runST $ do
  let !len = BA.sizeofByteArray ba
  thawed <- BA.thawByteArray ba 0 len
  traverse_ (setBit' thawed len) ixes
  Bitvector <$> BA.unsafeFreezeByteArray thawed

setBit' ::
  forall (s :: Type).
  MutableByteArray s ->
  Int ->
  Int ->
  ST s ()
setBit' mba !len !ix = do
  let !bigIx = len - (ix `quot` 8) - 1
  let !smallIx = ix `rem` 8
  !originalByte :: Word8 <- BA.readByteArray mba bigIx
  let !newByte = originalByte .|. bit smallIx
  BA.writeByteArray mba bigIx newByte

The passing of len is a little redundant here as we have this information available in constant time (MutableByteArrays are counted).Yet, since we have to obtain this information anyway to thaw we may as well use it as much as we can. This way, we only need θ(n) space regardless of how many changes we need to make.

The last operation we will need is one of the two standard operations provided by rank-select dictionaries: select, more specifically select0. Traditionally, select0, when given a structure and a rank (first, second, etc.), returns the index of the 0 bit of that rank. For example, if we ask for the second 0 this will give the index of the 0 whose index is the second lowest. If we request too high a rank, select0 can fail: thus, we return in Maybe.

Additionally, we augment select0 to also allow us to specify a subrange of a Bitvector's bits in which to search. We specify this range as a pair of (start, span), where start is the beginning of the subrange we are interested in, and span is how many bits the range extends from the beginning. For simplicity, we will assume these are both multiples of 8.

selectIn :: Bitvector -> (Int, Int) -> Int -> Maybe Int
selectIn (Bitvector ba) (start, span) !which = do
  let !len = BA.sizeofByteArray ba
  let !startIx = len - (start `quot` 8) - 1
  let !stopIx = startIx - (span `quot` 8)
  go startIx stopIx which
  where
    go :: Int -> Int -> Int -> Maybe Int
    go !ix !lim !remaining
      | ix == lim = Nothing
      | otherwise = do
          let !compCurrentByte :: Word8 = complement . BA.indexByteArray ba $ ix
          -- Since we complemented the byte, this counts zeroes
          let !counted = popCount compCurrentByte
          if counted > remaining
            then pure . dig compCurrentByte $ remaining
            else (8 +) <$> go (ix - 1) lim (remaining - counted)
    -- Goes through the complemented byte to find our position
    dig :: Word8 -> Int -> Int
    dig !src !pos =
      let !counted = countTrailingZeros src
       in if pos == 0
            then counted
            else
              let !offset = counted + 1
               in offset + dig (unsafeShiftR src offset) (pos - 1)

The operation proceeds in three stages:

  1. Determine, based on start and span, what the index range for the search should be. This is similar to how we do bit indexing.

  2. Starting from the highest byte index in the range determined at 1, read the byte at that index, complement it, then determine its Hamming weight. This tells us the number of 0 bits in this byte.

  3. If the required rank is less than the Hamming weight calculated in 2, go to step 4. Otherwise, add 8 to the result of recursively searching in the next index down, reducing the required rank by the Hamming weight. If we run out of bytes without having reached step 4, fail with Nothing.

  4. 'Count off' zero bits in the byte from 2 until we find the one that has the required rank. Return its position (which will accumulate with step 3 if needed).

This is a somewhat inefficient method of calculating select0: more capable rank-select dictionaries use a range of techniques to make this operation more efficient both asymptotically and in practice. To simplify the presentation of our ideas, we won't delve into this topic.

First attempt

With the preceding operations and insights, we are ready to try and implement an n-queens solver.

We'll start by choosing the first representation of our state that could possibly work: an n by n bitmatrix (two-dimensional bitvector), represented in row major order. We treat set bits in this bitmatrix as marked and clear bits as, well, clear. Essentially, we represent the entire board, marking it as we descend. The board starts out completely clear.

We first define a data type to represent boards, as well as a way to construct a blank one:

data Board = Board Int Bitvector

-- Assume that dim is a multiple of 8
blankBoard :: Int -> Board
blankBoard n = Board n . zeroes $ n * n

We have to store the dimension, as otherwise, we have no way of knowing exactly how large each row is. While we could potentially avoid this, it would require computing square roots, which isn't particularly pleasant. We also note that this representation is quadratic relative to n (or θ(n^2) if you prefer).

At each stage of our search, we have a board, a current row, and a current choice ordinal. We start with a blank board, row 0, and choice ordinal 0. Our first step is to determine what column our current choice ordinal refers to. To illustrate this, consider the following row with X indicating a marked position and _ indicating an unmarked one:

0 1 2 3 4 5 6 7
_ _ X X X _ X X

Here, choice ordinal 2 (that is, the third option) corresponds to position 5. However, if we instead had a row like this:

0 1 2 3 4 5 6 7
_ X X X X X _ _

then choice ordinal 2 would instead correspond to position 7. It's also possible that we request a choice ordinal that simply does not exist. In this row

0 1 2 3 4 5 6 7
_ X X X X _ X X

choice ordinal 2 is unavailable (as there is no third available position).

To determine the corresponding position (if any), we can use the selectIn operation we described previously:

-- Position argument is implicitly passed
availableInRow :: Board -> Int -> Int -> Maybe Int
availableInRow (Board dim vec) row = selectIn vec (row * dim, dim)

This is why we chose a row-major ordering for our bitmatrix, as it keeps row bits adjacent to each other. This also ties into our choice of bit indexing scheme. We also see why we need to store the dimension together with a Board: without this information, it would be difficult to determine what the start index, or span, that we need for any given row is.

If we fail to find a position corresponding to a given choice ordinal, we are 'locked out', because any queen placement in this row would threaten a queen placed previously. In that case, we backtrack. Otherwise, suppose we have a position pos from availableInRow. What we do next depends on which row we are in:

  • If we are in the last row, there are no more queens to place, so we 'lock in' all our placements and produce the solution.

  • Otherwise, we mark all positions on our Board that a queen placed in the column pos would threaten and attempt to move to the next row.

The attempt in the next row might fail; if we are backtracked to, we raise the choice ordinal and try this whole process again.

When marking our board, we must mark the entire column pos, as well as the left and right diagonals extending from our current position downwards. We don't have to consider upward diagonals, as we cannot affect queens we have placed in such diagonals[3]. We do this marking in one step, using our previously-specified setBits function:

markUnavailable :: Board -> Int -> Int -> Board
markUnavailable (Board dim vec) row col =
  Board dim $ setBits vec updateIxes
  where
    updateIxes :: [Int]
    updateIxes =
      map rowColToIx $
        -- Our column
        map (,col) subsequentRows
          -- Left diagonal below us
          <> zip subsequentRows [col - 1, col - 2 .. 0]
          -- Right diagonal below us
          <> zip subsequentRows [col + 1, col + 2 .. dim - 1]
    subsequentRows :: [Int]
    subsequentRows = [row + 1, row + 2 .. dim - 1]
    rowColToIx :: (Int, Int) -> Int
    rowColToIx (r, c) = r * dim + c

The column positions are straightforward. The left and right diagonals work as described because, given pos as a marked queen position, the column corresponding to its left diagonal k rows down will be pos - k (to a minimum of 0), and the column corresponding to its right diagonal k rows down will be pos + k (to a maximum of n - 1). We can use the inherent truncating behaviour of zip to ensure that we don't accidentally produce an invalid position.

Putting this all together, we have:

nqueens :: Int -> [(Int, Int)]
nqueens dim
  -- Solution for 1 queen is trivial, 2 or 3 is impossible
  | dim < 4 = []
  -- Our restriction from before
  | dim `rem` 8 /= 0 = []
  | otherwise =
      let state = blankBoard dim
       in go 0 0 state
  where
    go :: Int -> Int -> Board -> [(Int, Int)]
    go row choice state
      -- Choice ordinal exceeds dimension
      | choice == dim = []
      -- Try to see if the current choice ordinal is available and where
      | otherwise = case availableInRow state row choice of
          -- There is no choice available at this ordinal
          Nothing -> []
          Just pos ->
            if row == dim - 1
              then -- This is the last row, commit
                [(row, pos)]
              else
                let marked = markUnavailable state row pos
                 in case go (row + 1) 0 marked of
                      -- We dead-ended ourselves, go to the next choice
                      -- ordinal and retry.
                      [] -> go row (choice + 1) state
                      -- We managed, so stick this choice on.
                      next -> (row, pos) : next

Note the use of recursion here for backtracking: after we 'commit' to a position for a queen, we update our Board, then try to continue. If we succeed, we can 'assemble' the solution, and if not, we increase the choice ordinal and continue.

Running this code for a standard chessboard, we get the following solution:

ghci> nqueens 8
[(0,0),(1,4),(2,7),(3,5),(4,2),(5,6),(6,1),(7,3)]

This corresponds to the following chess board, where Q marks a queen, and _ marks an empty cell:

  0 1 2 3 4 5 6 7
0 Q _ _ _ _ _ _ _
1 _ _ _ _ Q _ _ _
2 _ _ _ _ _ _ _ Q
3 _ _ _ _ _ Q _ _
4 _ _ Q _ _ _ _ _
5 _ _ _ _ _ _ Q _
6 _ Q _ _ _ _ _ _
7 _ _ _ Q _ _ _ _

However, if we time how long this procedure takes, we will be disappointed:

    8 queens:  OK
      22.4 μs ± 1.6 μs, 180 KB allocated,   3 B  copied,  34 MB peak memory
    16 queens: OK
      4.26 ms ± 300 μs,  29 MB allocated, 900 B  copied,  34 MB peak memory
    24 queens: OK
      248  ms ± 8.1 ms, 1.6 GB allocated,  54 KB copied,  34 MB peak memory

These results are not great: both the time and space required seem to grow much faster than the input size. Ultimately, this isn't surprising, if we consider the markUnavailable function we defined above. Given a dimension of size n, we have to modify (and thus copy) something whose size is θ(n^2) each time we 'commit' to a queen placement[4]. This means our algorithm is Ω(n^3) and O(n^4), with the exact running time depending on how much backtracking we end up doing.

Can we do better than this?

Doing better

It turns out that we can: in the paper Bit-vector encoding of n-queen problem, Qiu Zongyan demonstrates a way we can avoid storing the entire board state. This involves using an observation we have already made when defining markUnavailable, which we will discuss and implement.

The key idea of the paper is that, if we place a queen in a row r and a column c, for any subsequent row, we can establish which positions are unavailable, knowing only r, c and the offset k to the subsequent row in question. More precisely, the left diagonal from this queen in row r + k will make column c - k unavailable in that row, as long as c - k >= 0; analogously, the right diagonal from this queen in row r + k will make column c + k unavailable in that row, as long as c + k < n, where n is the dimension. Furthermore, we observe that in the case of multiple queens, we can 'overlay' these observations: even if the unavailable diagonals (or indeed, any unavailable position) stemming from two different queens overlap, this doesn't matter. Once a position is marked unavailable as a result of one 'commitment' to a queen's position, it will remain unavailable no matter how many other queen 'commitments' may block it.

To see this in effect, consider the following partial board with dimension 8:

  0 1 2 3 4 5 6 7
0 _ _ _ _ _ _ _ _
1 _ _ _ _ _ _ _ _
2 _ _ _ _ _ _ _ _
3 _ _ _ _ _ _ _ _
...

Suppose we 'commit' to a queen in position (0, 2). This will make column 2 unavailable for all future placements:

  0 1 2 3 4 5 6 7
0 _ _ X _ _ _ _ _
1 _ _ X _ _ _ _ _
2 _ _ X _ _ _ _ _
3 _ _ X _ _ _ _ _
...

We will also block the left diagonal from that position:

  0 1 2 3 4 5 6 7
0 _ _ _ _ _ _ _ _
1 _ X _ _ _ _ _ _
2 X _ _ _ _ _ _ _
3 _ _ _ _ _ _ _ _
...

And the right diagonal from that position:

  0 1 2 3 4 5 6 7
0 _ _ _ _ _ _ _ _
1 _ _ _ X _ _ _ _
2 _ _ _ _ X _ _ _
3 _ _ _ _ _ X _ _
...

Consider row 3 in the context of the rules stated above: here, r = 0, c = 2, k = 3. Thus, in this row, the left diagonal of our queen placed in row 0 would make column 2 - 3 unavailable, but as this is less than 0, this diagonal has no effect on row 3. Analogously, the right diagonal of our queen from row 0 would make column 2 + 3 unavailable; this is less than 8, so that makes (3, 5) unavailable for queen placement. Our diagram above agrees with this: this suggests that we don't have to mark all subsequent rows 'ahead of time', as we do in our original approach.

To implement this, instead of maintaining an entire board, the paper suggests maintaining three bitvectors, named down, left, and right. These track the positions made unavailable in the current row by the queens placed in previous rows, specifically the column(s) directly below them, in their left diagonal(s) and right diagonal(s) respectively. Each of these has a length equal to the dimension of the problem, and all of them start with all their bits clear. We also track the current row number and choice ordinal in the same way as our previous attempt, and both start the same way.

In light of this, we use a slightly modified approach for each row:

  1. Construct a bitvector control such that for any bit position in control, that bit position is clear if the corresponding position in all of down, left, and right is clear, and is set otherwise[5].

  2. Use control and the choice ordinal to determine in which column we can place a queen for this row. If none are available, backtrack.

  3. Update down, left, and right by setting the bit corresponding to our choice from 2.

  4. 'Slide over' all bits in left by 1 bit position higher, filling in 0 at the lowest bit position. Do the same for right but by 1-bit position lower and filling in 0 at the highest bit position.

  5. Repeat for the row below; if you get backtracked to, revert down, left, and right, and try another position. If none is available, backtrack.

To perform these steps, we will need some slightly modified operations. Firstly, to construct the control bitvector, we define an operation that effectively takes the bitwise OR of three other bitvectors:

or3 :: Bitvector -> Bitvector -> Bitvector -> Bitvector
or3 (Bitvector x) (Bitvector y) (Bitvector z) = runST $ do
  let !len = BA.sizeofByteArray x
  result <- BA.newByteArray len
  for_ [0 .. len - 1] $ \ix -> do
    let !newByte :: Word8 =
          BA.indexByteArray x ix
            .|. BA.indexByteArray y ix
            .|. BA.indexByteArray z ix
    BA.writeByteArray result ix newByte
  Bitvector <$> BA.unsafeFreezeByteArray result

The main reason we define this operation in this way is that we can avoid an unnecessary intermediate copy. If we defined a more standard bitwise OR of two bitvectors, we would end up creating an intermediate copy, only to throw it away immediately.

Since our position selection no longer requires slicing, we have a wrapper for selectIn that starts selection from the beginning and spans the entire bitvector:

select :: Bitvector -> Int -> Maybe Int
select bv@(Bitvector ba) = selectIn bv (0, BA.sizeofByteArray ba * 8)

Lastly, we need an operation that combines the bit setting and 'sliding' we need to update left and right. We describe the update operation for adjustLeft here, as adjustRight is similar (just symmetric):

adjustLeft :: Bitvector -> Int -> Bitvector
adjustLeft (Bitvector ba) i = runST $ do
  let !len = BA.sizeofByteArray ba
  thawed <- BA.thawByteArray ba 0 len
  setBit' thawed len i
  let !topBitMask :: Word8 = 0x80
  for_ [0, 1 .. len - 2] $ \ix -> do
    !currentByte :: Word8 <- BA.readByteArray thawed ix
    !nextByte :: Word8 <- BA.readByteArray thawed (ix + 1)
    let !nextOverflowBit = nextByte .&. topBitMask
    let !newCurrentByte = unsafeShiftL currentByte 1 .|. unsafeShiftR nextOverflowBit 7
    BA.writeByteArray thawed ix newCurrentByte
  let !lastIx = len - 1
  !lastByte :: Word8 <- BA.readByteArray thawed lastIx
  BA.writeByteArray thawed lastIx (unsafeShiftL lastByte 1)
  Bitvector <$> BA.unsafeFreezeByteArray thawed

The bit setting helper is re-used from our definition of setBit and setBits. The 'sliding' operation is a little bit more difficult, as we have to move bits across byte boundaries. To make this work, we use a loop that looks not only at a 'current' position but also at the position at the next index. We then use a mask to extract the top bit of the next byte, then use a combination of shift and OR to construct a new byte, based on the current byte and the bit extracted. The only exception is the last byte, as this just requires 'filling in' a zero bit, which a shift can already do. This is important, as this ensures that left diagonals 'fall off' when the row number gets large enough.

Putting all these together, we have the following, improved nqueens:

nqueens :: Int -> [(Int, Int)]
nqueens dim
  | dim < 4 = []
  | dim `rem` 8 /= 0 = []
  | otherwise =
      let down = zeroes dim
          left = zeroes dim
          right = zeroes dim
       in go 0 0 down left right . zeroes $ dim
  where
    -- We hoist the control vector into the arguments, as this way, we won't be
    -- forced to reconstruct it on each backtrack.
    go ::
      Int ->
      Int ->
      Bitvector ->
      Bitvector ->
      Bitvector ->
      Bitvector ->
      [(Int, Int)]
    go row choice down left right control
      -- Choice ordinal matches dimension
      | choice == dim = []
      -- Try to see if the current choice ordinal is available and where
      | otherwise = case select control choice of
          -- No choice here
          Nothing -> []
          Just pos ->
            if row == dim - 1
              then -- this is the last row, commit
                [(row, pos)]
              else
                let newDown = setBit down pos
                    newLeft = adjustLeft left pos
                    newRight = adjustRight right pos
                    newControl = or3 newDown newLeft newRight
                 in case go (row + 1) 0 newDown newLeft newRight newControl of
                      -- We dead-ended ourselves, go to the next choice ordinal
                      -- and retry
                      [] -> go row (choice + 1) down left right control
                      next -> (row, pos) : next

We see that the basic structure of the solution is the same. The only difference is that the state we maintain is different, and the way we convert a choice ordinal into a position is slightly different. We also no longer need a special data type for boards, as modifications to the bitvectors we work with don't need the extra information.

Even in theory, this implementation should be more efficient: we now require only θ(n) copying (and work) at each step of the search, as we no longer maintain an entire board. Thus, the entire search is Ω(n^2) and O(n^3), depending on how much backtracking we have to do. This is an asymptotic improvement in both time and space over our initial attempt.

However, does this actually help in practice?

    8 queens:  OK
      8.77 μs ± 201 ns,  40 KB allocated,   1 B  copied,  34 MB peak memory
    16 queens: OK
      942  μs ±  28 μs, 3.6 MB allocated, 223 B  copied,  34 MB peak memory
    24 queens: OK
      41.2 ms ± 2.1 ms, 159 MB allocated,  10 KB copied,  34 MB peak memory

Yes, significantly! Even the smallest case is almost three times faster (and uses almost five times less memory overall), and for larger cases, the improvement is much larger. We can also see the 'copy-through' is smaller for our solution. This is likely because we avoid the list-passing from markUnavailable. We still unfortunately see the non-linear growth from before, but it appears slower, which is consistent with the analysis predicting an asymptotic improvement.

Could we do even better than this? Not asymptotically at least: due to our row-by-row approach, we have to accept a factor of n even in the best case (and as much as n^2 in the worst), and our requirement for backtracking means that it's difficult to see how we would need less than linear time or space at each search step. While it might be possible to use a structure that doesn't require copying in its entirety (such as some kind of red-black tree), it's unlikely that we could find a structure that efficiently supports all operations we need (select being particularly tricky) while also not having large constant factors in its memory use. This is one advantage of a bitwise representation: while the copying is linear, the amount of memory copied is small and compact which makes it significantly cheaper. While we could potentially reduce the constant factors at play here, asymptotically there aren't really improvements to be had.

The Plutus bit

This article came from the author's experience with implementing CIP-122 and CIP-123, as well as the n-queens benchmark, into Plutus. In this context, the type of bitwise-oriented code we described above has advantages above and beyond what would be the case in regular Haskell. This is due to the severe limitations on memory imposed for any Plutus script; thus, succinct (or in our case, implicit) data structures go from a luxury to practically a necessity. Furthermore, while we've had a compact byte-oriented structure available for us to use in Plutus from the very beginning (in the form of BuiltinByteString), until the work by MLabs, there were no reasonable ways to work with individual bits[6].

With CIP-122 and CIP-123, the kind of code we described above is possible in Plutus scripts as well. While it is unlikely that any script would want an 8-queens solver, the ability to define and use succinct data structures in such Plutus scripts enables work that was previously either impractical or impossible. We had this kind of problem - and this kind of use case - in mind when we wrote CIP-122 and CIP-123. We also provided a version of the second approach to solving n-queens as a benchmark for Plutus[7], both to demonstrate the effectiveness of these operations and to provide a guide to others for how they could be used.

Your Project, Our Expertise

As we continue to explore innovative solutions like the bitwise operations discussed in this article, we at MLabs are dedicated to transforming ideas into reality. Whether optimizing existing work or building from scratch, we assist teams at any stage of their project journey. As a leading AI and blockchain consultancy, we pride ourselves on helping clients tackle complex projects with outstanding project management and developer support.

Our mission is to empower businesses with innovative, secure, and mission-critical software solutions that deliver exceptional value and drive growth. We have a proven track record of building robust projects tailored to client needs.

If you have an idea or project that could benefit from our expertise, we invite you to contact us today. Discover how MLabs can turn your vision into a successful reality.


  1. In fact, GHC itself uses it as part of its nofib benchmarking suite.
  2. This is not a significant restriction and can be worked around. We don't do it here because it would make the bitwise operations significantly more fiddly, and wouldn't help the exposition at all.
  3. This is because threatening is symmetric: if a queen threatens us, we also threaten her, and since we've marked all positions that prior-placed queens could threaten, if we avoid them, we can't threaten them either.
  4. Working mutably won't save us here. Due to backtracking, we have to retain the prior version of our state each time we recurse, possibly as deep as n recursions!
  5. Technically, we don't even need left and right to construct control, as knowing down, as well as the current row number, is enough to reconstruct left and right on demand. However, this is unlikely to be of much benefit: we exchange less stored state for significantly more work. We might be able to do less copying, but it's not clear if this would even benefit us much.
  6. This is essentially the case we made as part of CIP-122 and CIP-123.
  7. The only differences are due to the slightly different operations available to us. Conceptually, the implementation is the same.
Previous
Previous

Time is Running Out to Vote on the Future of Decentralized Innovation!

Next
Next

A comparative development analysis on Ethereum, Cardano and Tuxedo/Polkadot