Ask HN: How is it possible to get -0.0 in a sum?

I'm looking for corner cases where he result is -0.0. As far as I know, the only way to get -0.0 in a sum is

  (-0.0) + (-0.0)
Does someone know any other case in IEEE 754?

Bonus question: What happens in subtractions? I only know

  (-0.0) - (+0.0)
Is there any other case?

10 points | by gus_massa 1 day ago

4 comments

  • sparkie 1 day ago
    It depends on the FP rounding mode. If rounding mode is FE_TOWARDZERO/FE_UPWARD/FE_TONEAREST then the case you gave is the only one I'm aware of. If rounding mode is FE_DOWNWARD (towards negative infinity) then other calculations that result in a zero will give a -0.0.

    Here's an example of -1.0f + 1.0f resulting in -0.0: https://godbolt.org/z/5qvqsdh9P

    • gus_massa 14 hours ago
      Thanks! [Sorry for the delay.]

      ---

      FYI: For more context, I'm trying to send a PR to Chez Scheme (and indirectly to Racket) https://github.com/cisco/ChezScheme/pull/959 to reduce expressions like

        (+ 1.0 (length L))  ;  ==>  (+ 1.0 (fixnum->flonum (length L)))
      
      where the "fixnums" are small integers and "flonums" are double.

      It's fine, unless you have the case

        (+ -0.0 (length L))  ;  =wrong=>  (+ -0.0 (fixnum->flonum (length L)))
      
      because if the length is 0, it get's transformed into 0.0 instead of -0.0

      There are a few corner cases, in particular because it's possible to have

         (+ 1.0 x (length L))
      
      and I really want to avoid the runtime check of (length L) == 0 if possible.

      So I took a look, asked there, and now your opinion confirms what I got so far. My C is not very good, so it's nice to have a example of how the rounding directions are used. Luckily Chez Scheme only uses the default rounding and it's probably correct to cut a few corners. I'll take a looks for a few days in case there is some surprise.

      • sparkie 1 hour ago
        I'm not sure you can avoid the check, but you can avoid a branch.

        An AVX-512 extension has a `vfixupimm` instruction[1] which can adjust special floating point values. You could use this to adjust all zeroes to -0 but leave any non-zeroes untouched. It isn't very obvious how to use though.

            vfixupimmsd dst, src, fixup, flag
        
         * The `flag` is for error reporting - we can set it to zero to ignore errors.
        
         * `dst` and `src` are a floating point value - they can be the same register.
        
         * The instruction first checks `src` and turns any denormals into zero if the MXCSR.DAZ flag is set.
        
         * It then categorizes `src` as one of {QNAN, SNAN, ZERO, ONE, NEG_INF, POS_ING, NEG_VALUE, POS_VALUE}
        
         * `fixup` is an array of 8 nybbles (a 32-bit int) and is looked up based on the categorization of `src` {QNAN = 0 ... POS_VALUE = 7}
        
         * The values of each nybble denote which value to place into `dst`:
        
            0x0 : dst (unchanged)
            0x1 : src (with denormals as zero if MXCSR.DAZ is set)
            0x2 : QNaN(src)
            0x3 : QNAN_Indefinite
            0x4 : -INF
            0x5 : +INF
            0x6 : src < 0 ? -INF : +INF
            0x7 : -0
            0x8 : +0
            0x9 : -1
            0xA : +1
            0xB : 1/2
            0xC : 90.0
            0xD : PI/2
            0xE : MAX_FLOAT
            0xF : -MAX_FLOAT
        
        You want to set the nybble for categorization ZERO (bits 11..8) to 0x7 (-0) in `fixup`. This would mean you want `fixup` to be equal to `0x00000700`. So usage would be:

            double fixnum_to_flonum(int64_t fixnum) {
                __m128d result = { (double)fixnum };
                return _mm_cvtsd_f64(_mm_fixupimm_sd(result, result, (__m128i){ 0x700 }, 0));
            }
        
        Which compiles to just 4 instructions, with no branches:

            .FIXUP:
                .long   1792                            # 0x700
                .long   0                               # 0x0
                .long   0                               # 0x0
                .long   0                               # 0x0
            fixnum_to_flonum:
                vcvtsi2sd       xmm0, xmm0, rdi
                vmovq           xmm0, xmm0
                vfixupimmsd     xmm0, xmm0, qword ptr [rip + .FIXUP], 0
                ret
        
        It can be extended to operate on 8 int64->double at a time (__m512d) with little extra cost.

        You could maybe use this optimization where the instruction is available and just stick with a branch version otherwise, or figure out some other way to make it branchless - though I can't think of any other way which would be any faster than a branch.

        [1]:https://www.intel.com/content/www/us/en/docs/intrinsics-guid...

  • gethly 1 day ago
    i would guess that because of how *** * floats are in binary computers, you have something like -0.0000000000000000000000000000000000001 and when you round it you end up with -0.0. Same goes for positive value, you're just not used to write the + sign before every number, so seeing the minus feels strange.
    • dcminter 14 hours ago
      You're answering a question that OP did not ask.
  • kazinator 1 day ago
    What happens if we take the smallest (as in closest to zero) negative subnormal and add it to itself?