Signed overflow with 64 bit int when leetcode guarantees the answer would never overflow 32-bit int

- Hemil Ruparel 15 August, 2022

Welcome to another ~~crime~~ bug thriller. This time around the bug took about an hour and a half to find. And towards the end, I also had a friend helping. Hope you enjoy :)

Happy Independence Day everyone!

Being an Indian, its my duty to start this blog wishing everyone a Happy Independence Day. Today, 15th August 2022, marks the 76th Independence day and 75 years of Independence. (Its not off by one. Our first Independence day was 15th August, 1947. We were independent for 0 years then.)

Conclusion

Leetcode was right. The program did overflow. And also, it was a bug in the code. The final result was within the range of 32 bit int. But, the intermediate results of the computation were overflowing.

The problem

I am a member and moderator of CppIndia. The user Radical Ninja posted this leetcode problem he was trying to solve on the discussion channel. He wanted help finding the overflow. I didnt bother much to be honest. On 14th August, the post spiked Ankur Satle's curiosity.

Ankur mentions the line dp[i] += dp[i - ele]; is causing the issue. Radical ninja acknowledges that is the line causing the overflow. But there is some disconnect because given the inputs are so small, how will it ever overflow a 64 bit int when the answer is guaranteed to fit in a 32 bit int!

#include <vector>
#include <iostream>
	    
int combinationSum4(std::vector<int> nums, int target) {
    std::vector<int64_t> dp{};
    dp.resize(target + 1);
    dp[0] = 1;

    for (long long i = 1; i < dp.size(); ++i) {
        for (const auto &ele : nums) {
            if (i - ele < 0) {
                continue;
            }
            dp[i] += dp[i - ele];
        }
    }

    return dp[target];
}

int main() {
    std::cout << combinationSum4(
            {10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230,
             240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440,
             450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650,
             660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860,
             870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 111},
             999);
}

I have slightly modified the code but not in a way that changes its meaning.

This is the code with the data that overflowed. Its C++ but if you know any other programming language, you can most probably read it. std::vector is a dynamic array. So a array that resizes itself to fit capacity like python list or ArrayList in Java.

Note: This code works if we change the std::vector to use uint32_t which is 32 bit unsigned integer for some strange reason

First approach - Run the code

I said, well lets try to run the code first to see if gives the correct answer. I run the code and it prints 1. I ask Radical Ninja whom I will call RN from now as its too long to type. RN does not know the correct answer.

I then compile the code with ubsan. The command line is as follows:

g++ -fsanitize=undefined -g3 test.cpp

-fsanitize=undefined is used to turn on ubsan. -g3 means turn on full debugging information. Now I ran the code again and ubsan caught the overflow. It was at the line dp[i] += dp[i - ele]. UB San reported I was trying to add 1 to 9223372036854775807 which if you notice is exactly the maximum value a signed 64 bit int can handle in two's complement representation.

The program definitely overflows, but I found the value really weird. How is this exactly the max value 64bit ints can represent?

Break on overflow in GDB

My next approach is to break when the overflow happens and play around. After a bit of googling, I found this stackoverflow post which was exactly what I wanted.

The gist of the post is: ubsan has a bunch of functions __ubsan_handle_<foo> where foo is some undefined behaviour to handle that undefined behaviour. The post suggests the following:

(gdb) rbreak ^__ubsan_handle_
(gdb) commands
(gdb) finish
(gdb) end

rbreak for those of you who dont know is regular expression match. It means, gdb will set a break point on any function which begin with __ubsan_handle_. finish ensures the report is printed first and then debugger gets control. I honestly dont know how. But thats in the answer.

I follow the commands in GDB and then run the code. It hits the breakpoint at __ubsan_handle_dynamic_type_cache_miss@plt for some reason. I dont know why. But thats definitely not what we are looking for. We want to see an overflow bug. So, I press c to continue. Now, I hit another breakpoint __ubsan_handle_add_overflow@plt. Yes so we are at the right place. I check the backtrace. The overflow defintely happened in the combinationSum4 function. The vector is a 100 elements long. And the target is 999. Hmmm

Investigation

Right now, I am in the ub san handler. I type f 1 to get into my function i.e. combinationSum4. I print the value of i, it is 640. I print the value of ele. It is 640. Interesting. I print the value of dp[i - ele]. It is 1. I then print dp[i]. It is 9223372036854775807. Ah there is our overflow.

I had heard about this feature of gdb called watchpoints. It will break into the code whenever a variable is modified. I set a watchpoint for dp[640] to see when is it modified. 30 seconds pass and the code is still running. I terminate GDB. watchpoints are too slow to be useful in my hardware. Maybe, GDB has to emulate them because my processor doesnt support them. Hmm. Lets set a conditional breakpoint and see when I am modifying dp[640].

I set a conditional breakpoint on the line causing overflow when i is 640 using b 17 if i == 640 and run the program again (It was line 17. As I said earlier, this is not the exact code). GDB stops this time within a second. Thats nice. i check the value of i just to be sure. Its 640 as expected. I check the value of ele. Its 10. Now I print dp[i - ele]. Its 4611686018427387904. Off hand, it looks exactly half of the value that overflowed. At this point, I didnt check if my hypothesis was true. If you look back to the code, I am adding the value of dp[i - ele] to dp[i]. So, I want to trace how did I end up with such a large value. I set a breakpoint on the same line with the condition when i == 630 (i - ele i.e. 640 - 10) and run the program again. The program hits the breakpoint and I print the value of ele. ele is 10, dp[i - ele] is 2305843009213693952. Again, off hand, this seems like half of the previous value. I again set breakpoint when i == 620 (630 - 10) and run the program again. ele is again 10. dp [i - ele] is 1152921504606846976 which again off hand is half of the previous. Something is really going on in here. But I cant pinpoint exactly what.

Something is definitely going on here. I dont know what. There is definitely some error in the code. Leetcode is correct. There is definitely an overflow. RN is confused. Because the code gets accepted when you change the type of dp from std::vector<long long> to std::vector<uint32_t> even though int64_t has double the range of uint32_t which is unsigned 32 bit int. My rough guess is unsigned int overflow is defined to wrap. So UBSan does not catch that. Maybe, just maybe, in this case, it overflows just enough to give the correct answer. My next move, the code does not seem like its accessing array out of bounds, but maybe try running address sanitizer as well? I try it. Address Sanitizer (ASAN) find nothing.

All of this is back and forth is going on over discord. I invite RN to hop on a call in 30 minutes (which I realise right now while writing this article became 3 minutes due to a typo. RN is waiting for me while I am busy eating my dinner. Ankur is meanwhile on voice chat with RN trying to understand whats going on. He is confused too as uint32_t is not significantly larger. It will overflow too if int64_t overflows. He asks if the solution is accepted with uint32_t and RN says it is accepted with uint32_t.

Pairing with RN and for a short while Ankur

I finish my dinner. Send a google meet link. RN joins but his mic isnt working. I text him over on discord that I cant hear him. He asks me to join the voice channel where Ankur is there already. I join that. I share my screen and my current progress. Ankur drops off for some work (or dinner, who knows?).

I am doing all of this over on my terminal. RN asks if I have a visual IDE. I promptly open my CLion and we start single stepping through the code. 10-15 iterations go by but we see no sign of the exponential growth. I have an idea. I set a conditional breakpoint when dp[i - ele] is greater than 10000. Because I saw the numbers. All were below 1000. So I guessed the value should never exceed 10k. This time when the breakpoint is hit, dp[i - ele] is 16384 which again look closely is a perfect power of 2. Something is cooking. I dont know what. I print i, it is 160.

Remember, now at this point, I still have no idea how the code works or solves the given problem. I am still just playing around and seeing what going on. At this point, I ask RN to explain the solution. He sends me over the link to Errichto's explanation. We watch it. I understand the idea. The idea is the number of ways in which you can make X is the sum of the number of ways in which you can make numbers less than X I believe.

We again start stepping over the code, setting breakpoints. I am debating with RN why did he not use recursive top down DP. He explains that its not always viable. I argue its a lot easier to understand. Just write a recursive solution and memoize and he says yea no. When I solve more problems, I will understand. I am like fine. Whatever.

By now, we are on call for about an hour. While still in running the program with the debugger, I pause and decide, to check the values of dp. When I expand dp, I notice a pattern. dp[0] is 1 because that was set in the program. dp[10] is 1, dp[20] is 2, dp[30] is 4, dp[40] is 8, dp[50] is 16 and dp[60] is 32. Wait. That looks like powers of 2! EUREKA! The values of dp is the next power of 2 every 10 elements. While I was trying to calculate what power would overflow, RN found a solution. Ignore the overflow.

RN finds the solution

Ignore the overflow! Thats the solution. He looks at the input data. Target is 999. The only element that you could use in the array to make 999 was 111. Others all were multiples of 10. There was no way you could combine them to make 999. Thats it!

This is the corrected code:

int combinationSum4(std::vector<int> nums, int target) {
    std::vector<int64_t> dp{};
    dp.resize(target + 1);
    dp[0] = 1;

    for (long long i = 1; i < dp.size(); ++i) {
	for (const auto &ele : nums) {
	    /// Ignore the element we will overflow dp[i]
            if (i - ele < 0 || std::numeric_limits<int64_t>::max() - dp[i - ele] < dp[i]) {
                continue;
            }
            dp[i] += dp[i - ele];
        }
    }

    return dp[target];
}

The reason this works is the element that overflows is not used in the answer anyway. So, who cares? Now it all starts to make sense

Everything starts to make sense

What was happening was this - every 10 element, we were getting a new power of 2 as the answer. Now, a 64 bit int would overflow after 64 iterations in two's complement representation. But we are getting a new power of two after every 10 numbers. Thats why we overflowed at 640 if you remember. And my guess that the number at 630 looks like half of the number at 640 was indeed correct. 620 was also indeed half of 630.

Why does the code work when we use uint32_t then?

Unsigned overflow is defined in C++ to wrap around. So UBSan does not catch it. Neither in leetcode or in our local debugging. The overflow is still there in when we use uint32_t. But that element is not used for the final answer. The code probably overflows int32_t multiple times during this execution but the answer is not used at the end so who cares?