Full Code of yohanes/teensy-u2f for AI

master 2c6ac3780fcc cached
17 files
299.2 KB
105.9k tokens
161 symbols
1 requests
Download .txt
Showing preview only (309K chars total). Download the full file or copy to clipboard to get everything.
Repository: yohanes/teensy-u2f
Branch: master
Commit: 2c6ac3780fcc
Files: 17
Total size: 299.2 KB

Directory structure:
gitextract_0rtdnj14/

├── LICENSE
├── LICENSE-micro-ecc.txt
├── README.md
├── u2f/
│   ├── Makefile.desktop
│   ├── asm_arm.h
│   ├── asm_arm_mult_square.h
│   ├── curve-specific.h
│   ├── desktop_test.cpp
│   ├── platform-specific.h
│   ├── sha256.c
│   ├── sha256.h
│   ├── types.h
│   ├── u2f.ino
│   ├── uECC.c
│   ├── uECC.h
│   └── uECC_vli.h
└── usb_desc.h

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
Copyright (c) 2015, Yohanes Nugroho
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.



================================================
FILE: LICENSE-micro-ecc.txt
================================================
Copyright (c) 2014, Kenneth MacKay
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
 * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


================================================
FILE: README.md
================================================
teensy-u2f
==========

U2F implementation for Teensy LC. 

This implementation is simple, works, but a bit insecure in the key handle generation part and user presence check. 

The key handle is generated from private key  XOR-ed  with a simple fixed key (note: key handle generation is outside of U2F implementation scope). Attacker that knows the fixed key (or able to deduce it from multiple registrations request) can get the private key sign any authentication request although in practice its not that easy to perform this attack.

Because there is no user button in Teensy LC, I didn't actually implement any button handling logic for 'user-presence' check. On first request this implementation will assume button is not pressed, and on next request it will assume the user has pressed the button. When logging in to a website, you may need to unplug and replug the Teensy LC.

For the ECDSA key generation and signing this implementation uses the micro-ecc library:

<https://github.com/kmackay/micro-ecc>


License
-------

See LICENSE.txt


================================================
FILE: u2f/Makefile.desktop
================================================

all: desktop_test

uECC.o	: uECC.c
	gcc -Wall -c uECC.c

desktop_test: desktop_test.cpp sha256.c u2f.ino uECC.o
	      g++ -Wall -DIS_DESKTOP_TEST=1 desktop_test.cpp sha256.c uECC.o  -o desktop_test 
	      

================================================
FILE: u2f/asm_arm.h
================================================
/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */

#ifndef _UECC_ASM_ARM_H_
#define _UECC_ASM_ARM_H_

#include "asm_arm_mult_square.h"

#if (uECC_SUPPORTS_secp256r1 || uECC_SUPPORTS_secp256k1)
    #define uECC_MIN_WORDS 8
#endif
#if uECC_SUPPORTS_secp224r1
    #undef uECC_MIN_WORDS
    #define uECC_MIN_WORDS 7
#endif
#if uECC_SUPPORTS_secp192r1
    #undef uECC_MIN_WORDS
    #define uECC_MIN_WORDS 6
#endif
#if uECC_SUPPORTS_secp160r1
    #undef uECC_MIN_WORDS
    #define uECC_MIN_WORDS 5
#endif

#if (uECC_PLATFORM == uECC_arm_thumb)
    #define REG_RW "+l"
    #define REG_WRITE "=l"
#else
    #define REG_RW "+r"
    #define REG_WRITE "=r"
#endif

#if (uECC_PLATFORM == uECC_arm_thumb || uECC_PLATFORM == uECC_arm_thumb2)
    #define REG_RW_LO "+l"
    #define REG_WRITE_LO "=l"
#else
    #define REG_RW_LO "+r"
    #define REG_WRITE_LO "=r"
#endif

#if (uECC_PLATFORM == uECC_arm_thumb2)
    #define RESUME_SYNTAX
#else
    #define RESUME_SYNTAX ".syntax divided \n\t"
#endif

#if (uECC_OPTIMIZATION_LEVEL >= 2)

uECC_VLI_API uECC_word_t uECC_vli_add(uECC_word_t *result,
                                      const uECC_word_t *left,
                                      const uECC_word_t *right,
                                      wordcount_t num_words) {
#if (uECC_MAX_WORDS != uECC_MIN_WORDS)
  #if (uECC_PLATFORM == uECC_arm_thumb) || (uECC_PLATFORM == uECC_arm_thumb2)
    uint32_t jump = (uECC_MAX_WORDS - num_words) * 4 * 2 + 1;
  #else /* ARM */
    uint32_t jump = (uECC_MAX_WORDS - num_words) * 4 * 4;
  #endif
#endif
    uint32_t carry;
    uint32_t left_word;
    uint32_t right_word;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "movs %[carry], #0 \n\t"
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
        "adr %[left], 1f \n\t"
        ".align 4 \n\t"
        "adds %[jump], %[left] \n\t"
    #endif
        
        "ldmia %[lptr]!, {%[left]} \n\t"
        "ldmia %[rptr]!, {%[right]} \n\t"
        "adds %[left], %[right] \n\t"
        "stmia %[dptr]!, {%[left]} \n\t"
        
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
        "bx %[jump] \n\t"
    #endif
        "1: \n\t"
        REPEAT(DEC(uECC_MAX_WORDS),
            "ldmia %[lptr]!, {%[left]} \n\t"
            "ldmia %[rptr]!, {%[right]} \n\t"
            "adcs %[left], %[right] \n\t"
            "stmia %[dptr]!, {%[left]} \n\t")
        
        "adcs %[carry], %[carry] \n\t"
        RESUME_SYNTAX
        : [dptr] REG_RW_LO (result), [lptr] REG_RW_LO (left), [rptr] REG_RW_LO (right),
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
          [jump] REG_RW_LO (jump),
    #endif
          [carry] REG_WRITE_LO (carry), [left] REG_WRITE_LO (left_word),
          [right] REG_WRITE_LO (right_word)
        :
        : "cc", "memory"
    );
    return carry;
}
#define asm_add 1

uECC_VLI_API uECC_word_t uECC_vli_sub(uECC_word_t *result,
                                      const uECC_word_t *left,
                                      const uECC_word_t *right,
                                      wordcount_t num_words) {
#if (uECC_MAX_WORDS != uECC_MIN_WORDS)
  #if (uECC_PLATFORM == uECC_arm_thumb) || (uECC_PLATFORM == uECC_arm_thumb2)
    uint32_t jump = (uECC_MAX_WORDS - num_words) * 4 * 2 + 1;
  #else /* ARM */
    uint32_t jump = (uECC_MAX_WORDS - num_words) * 4 * 4;
  #endif
#endif
    uint32_t carry;
    uint32_t left_word;
    uint32_t right_word;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "movs %[carry], #0 \n\t"
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
        "adr %[left], 1f \n\t"
        ".align 4 \n\t"
        "adds %[jump], %[left] \n\t"
    #endif
        
        "ldmia %[lptr]!, {%[left]} \n\t"
        "ldmia %[rptr]!, {%[right]} \n\t"
        "subs %[left], %[right] \n\t"
        "stmia %[dptr]!, {%[left]} \n\t"
        
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
        "bx %[jump] \n\t"
    #endif
        "1: \n\t"
        REPEAT(DEC(uECC_MAX_WORDS),
            "ldmia %[lptr]!, {%[left]} \n\t"
            "ldmia %[rptr]!, {%[right]} \n\t"
            "sbcs %[left], %[right] \n\t"
            "stmia %[dptr]!, {%[left]} \n\t")
        
        "adcs %[carry], %[carry] \n\t"
        RESUME_SYNTAX
        : [dptr] REG_RW_LO (result), [lptr] REG_RW_LO (left), [rptr] REG_RW_LO (right),
    #if (uECC_MAX_WORDS != uECC_MIN_WORDS)
          [jump] REG_RW_LO (jump),
    #endif
          [carry] REG_WRITE_LO (carry), [left] REG_WRITE_LO (left_word),
          [right] REG_WRITE_LO (right_word)
        :
        : "cc", "memory"
    );
    return !carry; /* Note that on ARM, carry flag set means "no borrow" when subtracting
                      (for some reason...) */
}
#define asm_sub 1

#endif /* (uECC_OPTIMIZATION_LEVEL >= 2) */

#if (uECC_OPTIMIZATION_LEVEL >= 3)

#define FAST_MULT_ASM_5_TO_6                 \
    "cmp r3, #5 \n\t"                        \
    "beq 1f \n\t"                            \
                                             \
    /* r4 = left high, r5 = right high */    \
    "ldr r4, [r1] \n\t"                      \
    "ldr r5, [r2] \n\t"                      \
                                             \
    "sub r0, #20 \n\t"                       \
    "sub r1, #20 \n\t"                       \
    "sub r2, #20 \n\t"                       \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r14, #0 \n\t"                       \
    "umull r9, r10, r4, r8 \n\t"             \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r9, r9, r6 \n\t"                   \
    "adc r10, r10, #0 \n\t"                  \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r10, r10, r6 \n\t"                 \
    "adcs r14, r14, #0 \n\t"                 \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r9, #0 \n\t"                        \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "str r10, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r14, r14, r6 \n\t"                 \
    "adcs r9, r9, #0 \n\t"                   \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r10, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "str r14, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r9, r9, r6 \n\t"                   \
    "adcs r10, r10, #0 \n\t"                 \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r14, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r10, r10, r6 \n\t"                 \
    "adcs r14, r14, #0 \n\t"                 \
    /* skip past already-loaded (r4, r5) */  \
    "ldr r7, [r1], #8 \n\t"                  \
    "ldr r8, [r2], #8 \n\t"                  \
    "mov r9, #0 \n\t"                        \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "str r10, [r0], #4 \n\t"                 \
                                             \
    "umull r11, r12, r4, r5 \n\t"            \
    "adds r11, r11, r14 \n\t"                \
    "adc r12, r12, r9 \n\t"                  \
    "stmia r0!, {r11, r12} \n\t"

#define FAST_MULT_ASM_6_TO_7                    \
    "cmp r3, #6 \n\t"                           \
    "beq 1f \n\t"                               \
                                                \
    /* r4 = left high, r5 = right high */       \
    "ldr r4, [r1] \n\t"                         \
    "ldr r5, [r2] \n\t"                         \
                                                \
    "sub r0, #24 \n\t"                          \
    "sub r1, #24 \n\t"                          \
    "sub r2, #24 \n\t"                          \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "ldr r7, [r1], #4 \n\t"                     \
    "ldr r8, [r2], #4 \n\t"                     \
    "mov r14, #0 \n\t"                          \
    "umull r9, r10, r4, r8 \n\t"                \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r9, r9, r6 \n\t"                      \
    "adc r10, r10, #0 \n\t"                     \
    "adds r9, r9, r11 \n\t"                     \
    "adcs r10, r10, r12 \n\t"                   \
    "adc r14, r14, #0 \n\t"                     \
    "str r9, [r0], #4 \n\t"                     \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "adds r10, r10, r6 \n\t"                    \
    "adcs r14, r14, #0 \n\t"                    \
    "ldr r7, [r1], #4 \n\t"                     \
    "ldr r8, [r2], #4 \n\t"                     \
    "mov r9, #0 \n\t"                           \
    "umull r11, r12, r4, r8 \n\t"               \
    "adds r10, r10, r11 \n\t"                   \
    "adcs r14, r14, r12 \n\t"                   \
    "adc r9, r9, #0 \n\t"                       \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r10, r10, r11 \n\t"                   \
    "adcs r14, r14, r12 \n\t"                   \
    "adc r9, r9, #0 \n\t"                       \
    "str r10, [r0], #4 \n\t"                    \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "adds r14, r14, r6 \n\t"                    \
    "adcs r9, r9, #0 \n\t"                      \
    "ldr r7, [r1], #4 \n\t"                     \
    "ldr r8, [r2], #4 \n\t"                     \
    "mov r10, #0 \n\t"                          \
    "umull r11, r12, r4, r8 \n\t"               \
    "adds r14, r14, r11 \n\t"                   \
    "adcs r9, r9, r12 \n\t"                     \
    "adc r10, r10, #0 \n\t"                     \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r14, r14, r11 \n\t"                   \
    "adcs r9, r9, r12 \n\t"                     \
    "adc r10, r10, #0 \n\t"                     \
    "str r14, [r0], #4 \n\t"                    \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "adds r9, r9, r6 \n\t"                      \
    "adcs r10, r10, #0 \n\t"                    \
    "ldr r7, [r1], #4 \n\t"                     \
    "ldr r8, [r2], #4 \n\t"                     \
    "mov r14, #0 \n\t"                          \
    "umull r11, r12, r4, r8 \n\t"               \
    "adds r9, r9, r11 \n\t"                     \
    "adcs r10, r10, r12 \n\t"                   \
    "adc r14, r14, #0 \n\t"                     \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r9, r9, r11 \n\t"                     \
    "adcs r10, r10, r12 \n\t"                   \
    "adc r14, r14, #0 \n\t"                     \
    "str r9, [r0], #4 \n\t"                     \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "adds r10, r10, r6 \n\t"                    \
    "adcs r14, r14, #0 \n\t"                    \
    "ldr r7, [r1], #4 \n\t"                     \
    "ldr r8, [r2], #4 \n\t"                     \
    "mov r9, #0 \n\t"                           \
    "umull r11, r12, r4, r8 \n\t"               \
    "adds r10, r10, r11 \n\t"                   \
    "adcs r14, r14, r12 \n\t"                   \
    "adc r9, r9, #0 \n\t"                       \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r10, r10, r11 \n\t"                   \
    "adcs r14, r14, r12 \n\t"                   \
    "adc r9, r9, #0 \n\t"                       \
    "str r10, [r0], #4 \n\t"                    \
                                                \
    "ldr r6, [r0] \n\t"                         \
    "adds r14, r14, r6 \n\t"                    \
    "adcs r9, r9, #0 \n\t"                      \
    /* skip past already-loaded (r4, r5) */     \
    "ldr r7, [r1], #8 \n\t"                     \
    "ldr r8, [r2], #8 \n\t"                     \
    "mov r10, #0 \n\t"                          \
    "umull r11, r12, r4, r8 \n\t"               \
    "adds r14, r14, r11 \n\t"                   \
    "adcs r9, r9, r12 \n\t"                     \
    "adc r10, r10, #0 \n\t"                     \
    "umull r11, r12, r5, r7 \n\t"               \
    "adds r14, r14, r11 \n\t"                   \
    "adcs r9, r9, r12 \n\t"                     \
    "adc r10, r10, #0 \n\t"                     \
    "str r14, [r0], #4 \n\t"                    \
                                                \
    "umull r11, r12, r4, r5 \n\t"               \
    "adds r11, r11, r9 \n\t"                    \
    "adc r12, r12, r10 \n\t"                    \
    "stmia r0!, {r11, r12} \n\t"

#define FAST_MULT_ASM_7_TO_8                 \
    "cmp r3, #7 \n\t"                        \
    "beq 1f \n\t"                            \
                                             \
    /* r4 = left high, r5 = right high */    \
    "ldr r4, [r1] \n\t"                      \
    "ldr r5, [r2] \n\t"                      \
                                             \
    "sub r0, #28 \n\t"                       \
    "sub r1, #28 \n\t"                       \
    "sub r2, #28 \n\t"                       \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r14, #0 \n\t"                       \
    "umull r9, r10, r4, r8 \n\t"             \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r9, r9, r6 \n\t"                   \
    "adc r10, r10, #0 \n\t"                  \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r10, r10, r6 \n\t"                 \
    "adcs r14, r14, #0 \n\t"                 \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r9, #0 \n\t"                        \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "str r10, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r14, r14, r6 \n\t"                 \
    "adcs r9, r9, #0 \n\t"                   \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r10, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "str r14, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r9, r9, r6 \n\t"                   \
    "adcs r10, r10, #0 \n\t"                 \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r14, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r10, r10, r6 \n\t"                 \
    "adcs r14, r14, #0 \n\t"                 \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r9, #0 \n\t"                        \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r10, r10, r11 \n\t"                \
    "adcs r14, r14, r12 \n\t"                \
    "adc r9, r9, #0 \n\t"                    \
    "str r10, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r14, r14, r6 \n\t"                 \
    "adcs r9, r9, #0 \n\t"                   \
    "ldr r7, [r1], #4 \n\t"                  \
    "ldr r8, [r2], #4 \n\t"                  \
    "mov r10, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r14, r14, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                  \
    "str r14, [r0], #4 \n\t"                 \
                                             \
    "ldr r6, [r0] \n\t"                      \
    "adds r9, r9, r6 \n\t"                   \
    "adcs r10, r10, #0 \n\t"                 \
    /* skip past already-loaded (r4, r5) */  \
    "ldr r7, [r1], #8 \n\t"                  \
    "ldr r8, [r2], #8 \n\t"                  \
    "mov r14, #0 \n\t"                       \
    "umull r11, r12, r4, r8 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "umull r11, r12, r5, r7 \n\t"            \
    "adds r9, r9, r11 \n\t"                  \
    "adcs r10, r10, r12 \n\t"                \
    "adc r14, r14, #0 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
                                             \
    "umull r11, r12, r4, r5 \n\t"            \
    "adds r11, r11, r10 \n\t"                \
    "adc r12, r12, r14 \n\t"                 \
    "stmia r0!, {r11, r12} \n\t"

#if (uECC_PLATFORM != uECC_arm_thumb)
uECC_VLI_API void uECC_vli_mult(uint32_t *result,
                                const uint32_t *left,
                                const uint32_t *right,
                                wordcount_t num_words) {
    register uint32_t *r0 __asm__("r0") = result;
    register const uint32_t *r1 __asm__("r1") = left;
    register const uint32_t *r2 __asm__("r2") = right;
    register uint32_t r3 __asm__("r3") = num_words;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "push {r3} \n\t"
    
#if (uECC_MIN_WORDS == 5)
        FAST_MULT_ASM_5
        "pop {r3} \n\t"
    #if (uECC_MAX_WORDS > 5)
        FAST_MULT_ASM_5_TO_6
    #endif
    #if (uECC_MAX_WORDS > 6)
        FAST_MULT_ASM_6_TO_7
    #endif
    #if (uECC_MAX_WORDS > 7)
        FAST_MULT_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 6)
        FAST_MULT_ASM_6
        "pop {r3} \n\t"
    #if (uECC_MAX_WORDS > 6)
        FAST_MULT_ASM_6_TO_7
    #endif
    #if (uECC_MAX_WORDS > 7)
        FAST_MULT_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 7)
        FAST_MULT_ASM_7
        "pop {r3} \n\t"
    #if (uECC_MAX_WORDS > 7)
        FAST_MULT_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 8)
        FAST_MULT_ASM_8
        "pop {r3} \n\t"
#endif

        "1: \n\t"
        RESUME_SYNTAX
        : "+r" (r0), "+r" (r1), "+r" (r2)
        : "r" (r3)
        : "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r14", "cc", "memory"
    );
}
#define asm_mult 1

#if uECC_SQUARE_FUNC

#define FAST_SQUARE_ASM_5_TO_6           \
    "cmp r2, #5 \n\t"                    \
    "beq 1f \n\t"                        \
                                         \
    /* r3 = high */                      \
    "ldr r3, [r1] \n\t"                  \
                                         \
    "sub r0, #20 \n\t"                   \
    "sub r1, #20 \n\t"                   \
                                         \
    /* Do off-center multiplication */   \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r4, r5, r3, r14 \n\t"         \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r7, r6, r3, r14 \n\t"         \
    "adds r5, r5, r7 \n\t"               \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r8, r7, r3, r14 \n\t"         \
    "adcs r6, r6, r8 \n\t"               \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r9, r8, r3, r14 \n\t"         \
    "adcs r7, r7, r9 \n\t"               \
    /* Skip already-loaded r3 */         \
    "ldr r14, [r1], #8 \n\t"             \
    "umull r10, r9, r3, r14 \n\t"        \
    "adcs r8, r8, r10 \n\t"              \
    "adcs r9, r9, #0 \n\t"               \
                                         \
    /* Multiply by 2 */                  \
    "mov r10, #0 \n\t"                   \
    "adds r4, r4, r4 \n\t"               \
    "adcs r5, r5, r5 \n\t"               \
    "adcs r6, r6, r6 \n\t"               \
    "adcs r7, r7, r7 \n\t"               \
    "adcs r8, r8, r8 \n\t"               \
    "adcs r9, r9, r9 \n\t"               \
    "adcs r10, r10, #0 \n\t"             \
                                         \
    /* Add into previous */              \
    "ldr r14, [r0] \n\t"                 \
    "adds r4, r4, r14 \n\t"              \
    "str r4, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r5, r5, r14 \n\t"              \
    "str r5, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r6, r6, r14 \n\t"              \
    "str r6, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r7, r7, r14 \n\t"              \
    "str r7, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r8, r8, r14 \n\t"              \
    "str r8, [r0], #4 \n\t"              \
    "adcs r9, r9, #0 \n\t"               \
    "adcs r10, r10, #0 \n\t"             \
                                         \
    /* Perform center multiplication */  \
    "umull r4, r5, r3, r3 \n\t"          \
    "adds r4, r4, r9 \n\t"               \
    "adc r5, r5, r10 \n\t"               \
    "stmia r0!, {r4, r5} \n\t"           

#define FAST_SQUARE_ASM_6_TO_7               \
    "cmp r2, #6 \n\t"                        \
    "beq 1f \n\t"                            \
                                             \
    /* r3 = high */                          \
    "ldr r3, [r1] \n\t"                      \
                                             \
    "sub r0, #24 \n\t"                       \
    "sub r1, #24 \n\t"                       \
                                             \
    /* Do off-center multiplication */       \
    "ldr r14, [r1], #4 \n\t"                 \
    "umull r4, r5, r3, r14 \n\t"             \
    "ldr r14, [r1], #4 \n\t"                 \
    "umull r7, r6, r3, r14 \n\t"             \
    "adds r5, r5, r7 \n\t"                   \
    "ldr r14, [r1], #4 \n\t"                 \
    "umull r8, r7, r3, r14 \n\t"             \
    "adcs r6, r6, r8 \n\t"                   \
    "ldr r14, [r1], #4 \n\t"                 \
    "umull r9, r8, r3, r14 \n\t"             \
    "adcs r7, r7, r9 \n\t"                   \
    "ldr r14, [r1], #4 \n\t"                 \
    "umull r10, r9, r3, r14 \n\t"            \
    "adcs r8, r8, r10 \n\t"                  \
    /* Skip already-loaded r3 */             \
    "ldr r14, [r1], #8 \n\t"                 \
    "umull r11, r10, r3, r14 \n\t"           \
    "adcs r9, r9, r11 \n\t"                  \
    "adcs r10, r10, #0 \n\t"                 \
                                             \
    /* Multiply by 2 */                      \
    "mov r11, #0 \n\t"                       \
    "adds r4, r4, r4 \n\t"                   \
    "adcs r5, r5, r5 \n\t"                   \
    "adcs r6, r6, r6 \n\t"                   \
    "adcs r7, r7, r7 \n\t"                   \
    "adcs r8, r8, r8 \n\t"                   \
    "adcs r9, r9, r9 \n\t"                   \
    "adcs r10, r10, r10 \n\t"                \
    "adcs r11, r11, #0 \n\t"                 \
                                             \
    /* Add into previous */                  \
    "ldr r14, [r0] \n\t"                     \
    "adds r4, r4, r14 \n\t"                  \
    "str r4, [r0], #4 \n\t"                  \
    "ldr r14, [r0] \n\t"                     \
    "adcs r5, r5, r14 \n\t"                  \
    "str r5, [r0], #4 \n\t"                  \
    "ldr r14, [r0] \n\t"                     \
    "adcs r6, r6, r14 \n\t"                  \
    "str r6, [r0], #4 \n\t"                  \
    "ldr r14, [r0] \n\t"                     \
    "adcs r7, r7, r14 \n\t"                  \
    "str r7, [r0], #4 \n\t"                  \
    "ldr r14, [r0] \n\t"                     \
    "adcs r8, r8, r14 \n\t"                  \
    "str r8, [r0], #4 \n\t"                  \
    "ldr r14, [r0] \n\t"                     \
    "adcs r9, r9, r14 \n\t"                  \
    "str r9, [r0], #4 \n\t"                  \
    "adcs r10, r10, #0 \n\t"                 \
    "adcs r11, r11, #0 \n\t"                 \
                                             \
    /* Perform center multiplication */      \
    "umull r4, r5, r3, r3 \n\t"              \
    "adds r4, r4, r10 \n\t"                  \
    "adc r5, r5, r11 \n\t"                   \
    "stmia r0!, {r4, r5} \n\t"

#define FAST_SQUARE_ASM_7_TO_8           \
    "cmp r2, #7 \n\t"                    \
    "beq 1f \n\t"                        \
                                         \
    /* r3 = high */                      \
    "ldr r3, [r1] \n\t"                  \
                                         \
    "sub r0, #28 \n\t"                   \
    "sub r1, #28 \n\t"                   \
                                         \
    /* Do off-center multiplication */   \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r4, r5, r3, r14 \n\t"         \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r7, r6, r3, r14 \n\t"         \
    "adds r5, r5, r7 \n\t"               \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r8, r7, r3, r14 \n\t"         \
    "adcs r6, r6, r8 \n\t"               \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r9, r8, r3, r14 \n\t"         \
    "adcs r7, r7, r9 \n\t"               \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r10, r9, r3, r14 \n\t"        \
    "adcs r8, r8, r10 \n\t"              \
    "ldr r14, [r1], #4 \n\t"             \
    "umull r11, r10, r3, r14 \n\t"       \
    "adcs r9, r9, r11 \n\t"              \
    /* Skip already-loaded r3 */         \
    "ldr r14, [r1], #8 \n\t"             \
    "umull r12, r11, r3, r14 \n\t"       \
    "adcs r10, r10, r12 \n\t"            \
    "adcs r11, r11, #0 \n\t"             \
                                         \
    /* Multiply by 2 */                  \
    "mov r12, #0 \n\t"                   \
    "adds r4, r4, r4 \n\t"               \
    "adcs r5, r5, r5 \n\t"               \
    "adcs r6, r6, r6 \n\t"               \
    "adcs r7, r7, r7 \n\t"               \
    "adcs r8, r8, r8 \n\t"               \
    "adcs r9, r9, r9 \n\t"               \
    "adcs r10, r10, r10 \n\t"            \
    "adcs r11, r11, r11 \n\t"            \
    "adcs r12, r12, #0 \n\t"             \
                                         \
    /* Add into previous */              \
    "ldr r14, [r0] \n\t"                 \
    "adds r4, r4, r14 \n\t"              \
    "str r4, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r5, r5, r14 \n\t"              \
    "str r5, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r6, r6, r14 \n\t"              \
    "str r6, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r7, r7, r14 \n\t"              \
    "str r7, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r8, r8, r14 \n\t"              \
    "str r8, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r9, r9, r14 \n\t"              \
    "str r9, [r0], #4 \n\t"              \
    "ldr r14, [r0] \n\t"                 \
    "adcs r10, r10, r14 \n\t"            \
    "str r10, [r0], #4 \n\t"             \
    "adcs r11, r11, #0 \n\t"             \
    "adcs r12, r12, #0 \n\t"             \
                                         \
    /* Perform center multiplication */  \
    "umull r4, r5, r3, r3 \n\t"          \
    "adds r4, r4, r11 \n\t"              \
    "adc r5, r5, r12 \n\t"               \
    "stmia r0!, {r4, r5} \n\t"           

uECC_VLI_API void uECC_vli_square(uECC_word_t *result,
                                  const uECC_word_t *left,
                                  wordcount_t num_words) {
    register uint32_t *r0 __asm__("r0") = result;
    register const uint32_t *r1 __asm__("r1") = left;
    register uint32_t r2 __asm__("r2") = num_words;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "push {r1, r2} \n\t"

#if (uECC_MIN_WORDS == 5)
        FAST_SQUARE_ASM_5
        "pop {r1, r2} \n\t"
    #if (uECC_MAX_WORDS > 5)
        "add r1, #20 \n\t"
        FAST_SQUARE_ASM_5_TO_6
    #endif
    #if (uECC_MAX_WORDS > 6)
        FAST_SQUARE_ASM_6_TO_7
    #endif
    #if (uECC_MAX_WORDS > 7)
        FAST_SQUARE_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 6)
        FAST_SQUARE_ASM_6
        "pop {r1, r2} \n\t"
    #if (uECC_MAX_WORDS > 6)
        "add r1, #24 \n\t"
        FAST_SQUARE_ASM_6_TO_7
    #endif
    #if (uECC_MAX_WORDS > 7)
        FAST_SQUARE_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 7)
        FAST_SQUARE_ASM_7
        "pop {r1, r2} \n\t"
    #if (uECC_MAX_WORDS > 7)
        "add r1, #28 \n\t"
        FAST_SQUARE_ASM_7_TO_8
    #endif
#elif (uECC_MIN_WORDS == 8)
        FAST_SQUARE_ASM_8
        "pop {r1, r2} \n\t"
#endif

        "1: \n\t"
        RESUME_SYNTAX
        : "+r" (r0), "+r" (r1)
        : "r" (r2)
        : "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12", "r14", "cc", "memory"
    );
}
#define asm_square 1
#endif /* uECC_SQUARE_FUNC */

#endif /* uECC_PLATFORM != uECC_arm_thumb */

#endif /* (uECC_OPTIMIZATION_LEVEL >= 3) */

/* ---- "Small" implementations ---- */

#if !asm_add
uECC_VLI_API uECC_word_t uECC_vli_add(uECC_word_t *result,
                                      const uECC_word_t *left,
                                      const uECC_word_t *right,
                                      wordcount_t num_words) {
    uint32_t carry = 0;
    uint32_t left_word;
    uint32_t right_word;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "1: \n\t"
        "ldmia %[lptr]!, {%[left]} \n\t"  /* Load left word. */
        "ldmia %[rptr]!, {%[right]} \n\t" /* Load right word. */
        "lsrs %[carry], #1 \n\t"          /* Set up carry flag (carry = 0 after this). */
        "adcs %[left], %[left], %[right] \n\t"   /* Add with carry. */
        "adcs %[carry], %[carry], %[carry] \n\t" /* Store carry bit. */
        "stmia %[dptr]!, {%[left]} \n\t"  /* Store result word. */
        "subs %[ctr], #1 \n\t"            /* Decrement counter. */
        "bne 1b \n\t"                     /* Loop until counter == 0. */
        RESUME_SYNTAX
        : [dptr] REG_RW (result), [lptr] REG_RW (left), [rptr] REG_RW (right),
          [ctr] REG_RW (num_words), [carry] REG_RW (carry),
          [left] REG_WRITE (left_word), [right] REG_WRITE (right_word)
        :
        : "cc", "memory"
    );
    return carry;
}
#define asm_add 1
#endif

#if !asm_sub
uECC_VLI_API uECC_word_t uECC_vli_sub(uECC_word_t *result,
                                      const uECC_word_t *left,
                                      const uECC_word_t *right,
                                      wordcount_t num_words) {
    uint32_t carry = 1; /* carry = 1 initially (means don't borrow) */
    uint32_t left_word;
    uint32_t right_word;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        "1: \n\t"
        "ldmia %[lptr]!, {%[left]} \n\t"  /* Load left word. */
        "ldmia %[rptr]!, {%[right]} \n\t" /* Load right word. */
        "lsrs %[carry], #1 \n\t"          /* Set up carry flag (carry = 0 after this). */
        "sbcs %[left], %[left], %[right] \n\t"   /* Subtract with borrow. */
        "adcs %[carry], %[carry], %[carry] \n\t" /* Store carry bit. */
        "stmia %[dptr]!, {%[left]} \n\t"  /* Store result word. */
        "subs %[ctr], #1 \n\t"            /* Decrement counter. */
        "bne 1b \n\t"                     /* Loop until counter == 0. */
        RESUME_SYNTAX
        : [dptr] REG_RW (result), [lptr] REG_RW (left), [rptr] REG_RW (right),
          [ctr] REG_RW (num_words), [carry] REG_RW (carry),
          [left] REG_WRITE (left_word), [right] REG_WRITE (right_word)
        :
        : "cc", "memory"
    );
    return !carry;
}
#define asm_sub 1
#endif

#if !asm_mult
uECC_VLI_API void uECC_vli_mult(uECC_word_t *result,
                                const uECC_word_t *left,
                                const uECC_word_t *right,
                                wordcount_t num_words) {
#if (uECC_PLATFORM != uECC_arm_thumb)
    uint32_t c0 = 0;
    uint32_t c1 = 0;
    uint32_t c2 = 0;
    uint32_t k = 0;
    uint32_t i;
    uint32_t t0, t1;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        
        "1: \n\t" /* outer loop (k < num_words) */
        "movs %[i], #0 \n\t" /* i = 0 */
        "b 3f \n\t"
        
        "2: \n\t" /* outer loop (k >= num_words) */
        "movs %[i], %[k] \n\t"         /* i = k */
        "subs %[i], %[last_word] \n\t" /* i = k - (num_words - 1) (times 4) */
        
        "3: \n\t" /* inner loop */
        "subs %[t0], %[k], %[i] \n\t" /* t0 = k-i */
        
        "ldr %[t1], [%[right], %[t0]] \n\t" /* t1 = right[k - i] */
        "ldr %[t0], [%[left], %[i]] \n\t"   /* t0 = left[i] */
        
        "umull %[t0], %[t1], %[t0], %[t1] \n\t" /* (t0, t1) = left[i] * right[k - i] */
        
        "adds %[c0], %[c0], %[t0] \n\t" /* add low word to c0 */
        "adcs %[c1], %[c1], %[t1] \n\t" /* add high word to c1, including carry */
        "adcs %[c2], %[c2], #0 \n\t"    /* add carry to c2 */

        "adds %[i], #4 \n\t"          /* i += 4 */
        "cmp %[i], %[last_word] \n\t" /* i > (num_words - 1) (times 4)? */
        "bgt 4f \n\t"                 /*   if so, exit the loop */
        "cmp %[i], %[k] \n\t"         /* i <= k? */
        "ble 3b \n\t"                 /*   if so, continue looping */
        
        "4: \n\t" /* end inner loop */
        
        "str %[c0], [%[result], %[k]] \n\t" /* result[k] = c0 */
        "mov %[c0], %[c1] \n\t"       /* c0 = c1 */
        "mov %[c1], %[c2] \n\t"       /* c1 = c2 */
        "movs %[c2], #0 \n\t"         /* c2 = 0 */
        "adds %[k], #4 \n\t"          /* k += 4 */
        "cmp %[k], %[last_word] \n\t" /* k <= (num_words - 1) (times 4) ? */
        "ble 1b \n\t"                 /*   if so, loop back, start with i = 0 */
        "cmp %[k], %[last_word], lsl #1 \n\t" /* k <= (num_words * 2 - 2) (times 4) ? */
        "ble 2b \n\t"                 /*   if so, loop back, start with i = (k + 1) - num_words */
        /* end outer loop */
        
        "str %[c0], [%[result], %[k]] \n\t" /* result[num_words * 2 - 1] = c0 */
        RESUME_SYNTAX
        : [c0] "+r" (c0), [c1] "+r" (c1), [c2] "+r" (c2),
          [k] "+r" (k), [i] "=&r" (i), [t0] "=&r" (t0), [t1] "=&r" (t1)
        : [result] "r" (result), [left] "r" (left), [right] "r" (right),
          [last_word] "r" ((num_words - 1) * 4)
        : "cc", "memory"
    );
    
#else /* Thumb-1 */
    uint32_t r4, r5, r6, r7;

    __asm__ volatile (
        ".syntax unified \n\t"
        "subs %[r3], #1 \n\t" /* r3 = num_words - 1 */
        "lsls %[r3], #2 \n\t" /* r3 = (num_words - 1) * 4 */
        "mov r8, %[r3] \n\t"  /* r8 = (num_words - 1) * 4 */
        "lsls %[r3], #1 \n\t" /* r3 = (num_words - 1) * 8 */
        "mov r9, %[r3] \n\t"  /* r9 = (num_words - 1) * 8 */
        "movs %[r3], #0 \n\t" /* c0 = 0 */
        "movs %[r4], #0 \n\t" /* c1 = 0 */
        "movs %[r5], #0 \n\t" /* c2 = 0 */
        "movs %[r6], #0 \n\t" /* k = 0 */
        
        "push {%[r0]} \n\t" /* keep result on the stack */
        
        "1: \n\t" /* outer loop (k < num_words) */
        "movs %[r7], #0 \n\t" /* r7 = i = 0 */
        "b 3f \n\t"
        
        "2: \n\t" /* outer loop (k >= num_words) */
        "movs %[r7], %[r6] \n\t" /* r7 = k */
        "mov %[r0], r8 \n\t"     /* r0 = (num_words - 1) * 4 */
        "subs %[r7], %[r0] \n\t" /* r7 = i = k - (num_words - 1) (times 4) */
        
        "3: \n\t" /* inner loop */
        "push {%[r6]} \n\t"
        "push {%[r5]} \n\t"
        "push {%[r4]} \n\t"
        "push {%[r3]} \n\t" /* push things, r3 (c0) is at the top of stack. */
        "subs %[r0], %[r6], %[r7] \n\t"          /* r0 = k - i */
        
        "ldr %[r4], [%[r2], %[r0]] \n\t" /* r4 = right[k - i] */
        "ldr %[r0], [%[r1], %[r7]] \n\t" /* r0 = left[i] */
        
        "lsrs %[r3], %[r0], #16 \n\t" /* r3 = a1 */
        "uxth %[r0], %[r0] \n\t"      /* r0 = a0 */
        
        "lsrs %[r5], %[r4], #16 \n\t" /* r5 = b1 */
        "uxth %[r4], %[r4] \n\t"      /* r4 = b0 */
        
        "movs %[r6], %[r3] \n\t"        /* r6 = a1 */
        "muls %[r6], %[r5], %[r6] \n\t" /* r6 = a1 * b1 */
        "muls %[r3], %[r4], %[r3] \n\t" /* r3 = b0 * a1 */
        "muls %[r5], %[r0], %[r5] \n\t" /* r5 = a0 * b1 */
        "muls %[r0], %[r4], %[r0] \n\t" /* r0 = a0 * b0 */
        
        "movs %[r4], #0 \n\t"    /* r4 = 0 */
        "adds %[r3], %[r5] \n\t" /* r3 = b0 * a1 + a0 * b1 */
        "adcs %[r4], %[r4] \n\t" /* r4 = carry */
        "lsls %[r4], #16 \n\t"   /* r4 = carry << 16 */
        "adds %[r6], %[r4] \n\t" /* r6 = a1 * b1 + carry */
        
        "lsls %[r4], %[r3], #16 \n\t" /* r4 = (b0 * a1 + a0 * b1) << 16 */
        "lsrs %[r3], #16 \n\t"        /* r3 = (b0 * a1 + a0 * b1) >> 16 */
        "adds %[r0], %[r4] \n\t"      /* r0 = low word = a0 * b0 + ((b0 * a1 + a0 * b1) << 16) */
        "adcs %[r6], %[r3] \n\t"      /* r6 = high word =
                                              a1 * b1 + carry + ((b0 * a1 + a0 * b1) >> 16) */
        
        "pop {%[r3]} \n\t" /* r3 = c0 */
        "pop {%[r4]} \n\t" /* r4 = c1 */
        "pop {%[r5]} \n\t" /* r5 = c2 */
        "adds %[r3], %[r0] \n\t"         /* add low word to c0 */
        "adcs %[r4], %[r6] \n\t"         /* add high word to c1, including carry */
        "movs %[r0], #0 \n\t"            /* r0 = 0 (does not affect carry bit) */
        "adcs %[r5], %[r0] \n\t"         /* add carry to c2 */
        
        "pop {%[r6]} \n\t" /* r6 = k */

        "adds %[r7], #4 \n\t"   /* i += 4 */
        "cmp %[r7], r8 \n\t"    /* i > (num_words - 1) (times 4)? */
        "bgt 4f \n\t"           /*   if so, exit the loop */
        "cmp %[r7], %[r6] \n\t" /* i <= k? */
        "ble 3b \n\t"           /*   if so, continue looping */
        
        "4: \n\t" /* end inner loop */
        
        "ldr %[r0], [sp, #0] \n\t" /* r0 = result */
        
        "str %[r3], [%[r0], %[r6]] \n\t" /* result[k] = c0 */
        "mov %[r3], %[r4] \n\t"          /* c0 = c1 */
        "mov %[r4], %[r5] \n\t"          /* c1 = c2 */
        "movs %[r5], #0 \n\t"            /* c2 = 0 */
        "adds %[r6], #4 \n\t"            /* k += 4 */
        "cmp %[r6], r8 \n\t"             /* k <= (num_words - 1) (times 4) ? */
        "ble 1b \n\t"                    /*   if so, loop back, start with i = 0 */
        "cmp %[r6], r9 \n\t"             /* k <= (num_words * 2 - 2) (times 4) ? */
        "ble 2b \n\t"                    /*   if so, loop back, with i = (k + 1) - num_words */
        /* end outer loop */
        
        "str %[r3], [%[r0], %[r6]] \n\t" /* result[num_words * 2 - 1] = c0 */
        "pop {%[r0]} \n\t"               /* pop result off the stack */
        
        ".syntax divided \n\t"
        : [r3] "+l" (num_words), [r4] "=&l" (r4),
          [r5] "=&l" (r5), [r6] "=&l" (r6), [r7] "=&l" (r7)
        : [r0] "l" (result), [r1] "l" (left), [r2] "l" (right)
        : "r8", "r9", "cc", "memory"
    );
#endif
}
#define asm_mult 1
#endif

#if uECC_SQUARE_FUNC
#if !asm_square
uECC_VLI_API void uECC_vli_square(uECC_word_t *result,
                                  const uECC_word_t *left,
                                  wordcount_t num_words) {
#if (uECC_PLATFORM != uECC_arm_thumb)
    uint32_t c0 = 0;
    uint32_t c1 = 0;
    uint32_t c2 = 0;
    uint32_t k = 0;
    uint32_t i, tt;
    uint32_t t0, t1;
    
    __asm__ volatile (
        ".syntax unified \n\t"
        
        "1: \n\t" /* outer loop (k < num_words) */
        "movs %[i], #0 \n\t" /* i = 0 */
        "b 3f \n\t"
        
        "2: \n\t" /* outer loop (k >= num_words) */
        "movs %[i], %[k] \n\t"         /* i = k */
        "subs %[i], %[last_word] \n\t" /* i = k - (num_words - 1) (times 4) */
        
        "3: \n\t" /* inner loop */
        "subs %[tt], %[k], %[i] \n\t" /* tt = k-i */
        
        "ldr %[t1], [%[left], %[tt]] \n\t" /* t1 = left[k - i] */
        "ldr %[t0], [%[left], %[i]] \n\t"  /* t0 = left[i] */
        
        "umull %[t0], %[t1], %[t0], %[t1] \n\t" /* (t0, t1) = left[i] * right[k - i] */
        
        "cmp %[i], %[tt] \n\t"      /* (i < k - i) ? */
        "bge 4f \n\t"               /*   if i >= k - i, skip */
        "lsls %[t1], #1 \n\t"       /* high word << 1 */
        "adc %[c2], %[c2], #0 \n\t" /* add carry bit to c2 */
        "lsls %[t0], #1 \n\t"       /* low word << 1 */
        "adc %[t1], %[t1], #0 \n\t" /* add carry bit to high word */
        
        "4: \n\t"

        "adds %[c0], %[c0], %[t0] \n\t" /* add low word to c0 */
        "adcs %[c1], %[c1], %[t1] \n\t" /* add high word to c1, including carry */
        "adcs %[c2], %[c2], #0 \n\t"    /* add carry to c2 */
        
        "adds %[i], #4 \n\t"          /* i += 4 */
        "cmp %[i], %[k] \n\t"         /* i >= k? */
        "bge 5f \n\t"                 /*   if so, exit the loop */
        "subs %[tt], %[k], %[i] \n\t" /* tt = k - i */
        "cmp %[i], %[tt] \n\t"        /* i <= k - i? */
        "ble 3b \n\t"                 /*   if so, continue looping */
        
        "5: \n\t" /* end inner loop */
        
        "str %[c0], [%[result], %[k]] \n\t" /* result[k] = c0 */
        "mov %[c0], %[c1] \n\t"       /* c0 = c1 */
        "mov %[c1], %[c2] \n\t"       /* c1 = c2 */
        "movs %[c2], #0 \n\t"         /* c2 = 0 */
        "adds %[k], #4 \n\t"          /* k += 4 */
        "cmp %[k], %[last_word] \n\t" /* k <= (num_words - 1) (times 4) ? */
        "ble 1b \n\t"                 /*   if so, loop back, start with i = 0 */
        "cmp %[k], %[last_word], lsl #1 \n\t" /* k <= (num_words * 2 - 2) (times 4) ? */
        "ble 2b \n\t"                 /*   if so, loop back, start with i = (k + 1) - num_words */
        /* end outer loop */
        
        "str %[c0], [%[result], %[k]] \n\t" /* result[num_words * 2 - 1] = c0 */
        RESUME_SYNTAX
        : [c0] "+r" (c0), [c1] "+r" (c1), [c2] "+r" (c2),
          [k] "+r" (k), [i] "=&r" (i), [tt] "=&r" (tt), [t0] "=&r" (t0), [t1] "=&r" (t1)
        : [result] "r" (result), [left] "r" (left), [last_word] "r" ((num_words - 1) * 4)
        : "cc", "memory"
    );
    
#else
    uint32_t r3, r4, r5, r6, r7;

    __asm__ volatile (
        ".syntax unified \n\t"
        "subs %[r2], #1 \n\t" /* r2 = num_words - 1 */
        "lsls %[r2], #2 \n\t" /* r2 = (num_words - 1) * 4 */
        "mov r8, %[r2] \n\t"  /* r8 = (num_words - 1) * 4 */
        "lsls %[r2], #1 \n\t" /* r2 = (num_words - 1) * 8 */
        "mov r9, %[r2] \n\t"  /* r9 = (num_words - 1) * 8 */
        "movs %[r2], #0 \n\t" /* c0 = 0 */
        "movs %[r3], #0 \n\t" /* c1 = 0 */
        "movs %[r4], #0 \n\t" /* c2 = 0 */
        "movs %[r5], #0 \n\t" /* k = 0 */
        
        "push {%[r0]} \n\t" /* keep result on the stack */
        
        "1: \n\t" /* outer loop (k < num_words) */
        "movs %[r6], #0 \n\t" /* r6 = i = 0 */
        "b 3f \n\t"
        
        "2: \n\t" /* outer loop (k >= num_words) */
        "movs %[r6], %[r5] \n\t" /* r6 = k */
        "mov %[r0], r8 \n\t"     /* r0 = (num_words - 1) * 4 */
        "subs %[r6], %[r0] \n\t" /* r6 = i = k - (num_words - 1) (times 4) */
        
        "3: \n\t" /* inner loop */
        "push {%[r5]} \n\t"
        "push {%[r4]} \n\t"
        "push {%[r3]} \n\t"
        "push {%[r2]} \n\t" /* push things, r2 (c0) is at the top of stack. */
        "subs %[r7], %[r5], %[r6] \n\t"          /* r7 = k - i */
        
        "ldr %[r3], [%[r1], %[r7]] \n\t" /* r3 = left[k - i] */
        "ldr %[r0], [%[r1], %[r6]] \n\t" /* r0 = left[i] */
        
        "lsrs %[r2], %[r0], #16 \n\t" /* r2 = a1 */
        "uxth %[r0], %[r0] \n\t"      /* r0 = a0 */
        
        "lsrs %[r4], %[r3], #16 \n\t" /* r4 = b1 */
        "uxth %[r3], %[r3] \n\t"      /* r3 = b0 */
        
        "movs %[r5], %[r2] \n\t"        /* r5 = a1 */
        "muls %[r5], %[r4], %[r5] \n\t" /* r5 = a1 * b1 */
        "muls %[r2], %[r3], %[r2] \n\t" /* r2 = b0 * a1 */
        "muls %[r4], %[r0], %[r4] \n\t" /* r4 = a0 * b1 */
        "muls %[r0], %[r3], %[r0] \n\t" /* r0 = a0 * b0 */
        
        "movs %[r3], #0 \n\t"    /* r3 = 0 */
        "adds %[r2], %[r4] \n\t" /* r2 = b0 * a1 + a0 * b1 */
        "adcs %[r3], %[r3] \n\t" /* r3 = carry */
        "lsls %[r3], #16 \n\t"   /* r3 = carry << 16 */
        "adds %[r5], %[r3] \n\t" /* r5 = a1 * b1 + carry */
        
        "lsls %[r3], %[r2], #16 \n\t" /* r3 = (b0 * a1 + a0 * b1) << 16 */
        "lsrs %[r2], #16 \n\t"        /* r2 = (b0 * a1 + a0 * b1) >> 16 */
        "adds %[r0], %[r3] \n\t"      /* r0 = low word = a0 * b0 + ((b0 * a1 + a0 * b1) << 16) */
        "adcs %[r5], %[r2] \n\t"      /* r5 = high word = 
                                              a1 * b1 + carry + ((b0 * a1 + a0 * b1) >> 16) */
    
        "movs %[r3], #0 \n\t"    /* r3 = 0 */
        "cmp %[r6], %[r7] \n\t"  /* (i < k - i) ? */
        "mov %[r7], %[r3] \n\t"  /* r7 = 0 (does not affect condition) */
        "bge 4f \n\t"            /*   if i >= k - i, skip */
        "lsls %[r5], #1 \n\t"    /* high word << 1 */
        "adcs %[r7], %[r3] \n\t" /* r7 = carry bit for c2 */
        "lsls %[r0], #1 \n\t"    /* low word << 1 */
        "adcs %[r5], %[r3] \n\t" /* add carry from shift to high word */
        
        "4: \n\t"
        "pop {%[r2]} \n\t" /* r2 = c0 */
        "pop {%[r3]} \n\t" /* r3 = c1 */
        "pop {%[r4]} \n\t" /* r4 = c2 */
        "adds %[r2], %[r0] \n\t"         /* add low word to c0 */
        "adcs %[r3], %[r5] \n\t"         /* add high word to c1, including carry */
        "movs %[r0], #0 \n\t"            /* r0 = 0 (does not affect carry bit) */
        "adcs %[r4], %[r0] \n\t"         /* add carry to c2 */
        "adds %[r4], %[r7] \n\t"         /* add carry from doubling (if any) */
        
        "pop {%[r5]} \n\t" /* r5 = k */
        
        "adds %[r6], #4 \n\t"           /* i += 4 */
        "cmp %[r6], %[r5] \n\t"         /* i >= k? */
        "bge 5f \n\t"                   /*   if so, exit the loop */
        "subs %[r7], %[r5], %[r6] \n\t" /* r7 = k - i */
        "cmp %[r6], %[r7] \n\t"         /* i <= k - i? */
        "ble 3b \n\t"                   /*   if so, continue looping */
        
        "5: \n\t" /* end inner loop */
        
        "ldr %[r0], [sp, #0] \n\t" /* r0 = result */
        
        "str %[r2], [%[r0], %[r5]] \n\t" /* result[k] = c0 */
        "mov %[r2], %[r3] \n\t"          /* c0 = c1 */
        "mov %[r3], %[r4] \n\t"          /* c1 = c2 */
        "movs %[r4], #0 \n\t"            /* c2 = 0 */
        "adds %[r5], #4 \n\t"            /* k += 4 */
        "cmp %[r5], r8 \n\t"             /* k <= (num_words - 1) (times 4) ? */
        "ble 1b \n\t"                    /*   if so, loop back, start with i = 0 */
        "cmp %[r5], r9 \n\t"             /* k <= (num_words * 2 - 2) (times 4) ? */
        "ble 2b \n\t"                    /*   if so, loop back, with i = (k + 1) - num_words */
        /* end outer loop */
        
        "str %[r2], [%[r0], %[r5]] \n\t" /* result[num_words * 2 - 1] = c0 */
        "pop {%[r0]} \n\t"               /* pop result off the stack */

        ".syntax divided \n\t"
        : [r2] "+l" (num_words), [r3] "=&l" (r3), [r4] "=&l" (r4),
          [r5] "=&l" (r5), [r6] "=&l" (r6), [r7] "=&l" (r7)
        : [r0] "l" (result), [r1] "l" (left)
        : "r8", "r9", "cc", "memory"
    );
#endif
}
#define asm_square 1
#endif
#endif /* uECC_SQUARE_FUNC */

#endif /* _UECC_ASM_ARM_H_ */


================================================
FILE: u2f/asm_arm_mult_square.h
================================================
/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */

#ifndef _UECC_ASM_ARM_MULT_SQUARE_H_
#define _UECC_ASM_ARM_MULT_SQUARE_H_

#define FAST_MULT_ASM_5                \
    "add r0, 12 \n\t"                  \
    "add r2, 12 \n\t"                  \
    "ldmia r1!, {r3,r4} \n\t"          \
    "ldmia r2!, {r6,r7} \n\t"          \
                                       \
    "umull r11, r12, r3, r6 \n\t"      \
    "stmia r0!, {r11} \n\t"            \
                                       \
    "mov r10, #0 \n\t"                 \
    "umull r11, r9, r3, r7 \n\t"       \
    "adds r12, r12, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r11, r14, r4, r6 \n\t"      \
    "adds r12, r12, r11 \n\t"          \
    "adcs r9, r9, r14 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "umull r12, r14, r4, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adc r10, r10, r14 \n\t"           \
    "stmia r0!, {r9, r10} \n\t"        \
                                       \
    "sub r0, 28 \n\t"                  \
    "sub r2, 20 \n\t"                  \
    "ldmia r2!, {r6,r7,r8} \n\t"       \
    "ldmia r1!, {r5} \n\t"             \
                                       \
    "umull r11, r12, r3, r6 \n\t"      \
    "stmia r0!, {r11} \n\t"            \
                                       \
    "mov r10, #0 \n\t"                 \
    "umull r11, r9, r3, r7 \n\t"       \
    "adds r12, r12, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r11, r14, r4, r6 \n\t"      \
    "adds r12, r12, r11 \n\t"          \
    "adcs r9, r9, r14 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "mov r11, #0 \n\t"                 \
    "umull r12, r14, r3, r8 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r4, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r5, r6 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "ldmia r1!, {r3} \n\t"             \
    "mov r12, #0 \n\t"                 \
    "umull r14, r9, r4, r8 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r5, r7 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r3, r6 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "ldr r14, [r0] \n\t"               \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, #0 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "ldmia r1!, {r4} \n\t"             \
    "mov r14, #0 \n\t"                 \
    "umull r9, r10, r5, r8 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "umull r9, r10, r3, r7 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "umull r9, r10, r4, r6 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "ldr r9, [r0] \n\t"                \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, #0 \n\t"           \
    "adc r14, r14, #0 \n\t"            \
    "stmia r0!, {r11} \n\t"            \
                                       \
    "ldmia r2!, {r6} \n\t"             \
    "mov r9, #0 \n\t"                  \
    "umull r10, r11, r5, r6 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r3, r8 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r4, r7 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "ldr r10, [r0] \n\t"               \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, #0 \n\t"           \
    "adc r9, r9, #0 \n\t"              \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "ldmia r2!, {r7} \n\t"             \
    "mov r10, #0 \n\t"                 \
    "umull r11, r12, r5, r7 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r3, r6 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r4, r8 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "ldr r11, [r0] \n\t"               \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, #0 \n\t"             \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r14} \n\t"            \
                                       \
    "mov r11, #0 \n\t"                 \
    "umull r12, r14, r3, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r4, r6 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "umull r14, r9, r4, r7 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adc r11, r11, r9 \n\t"            \
    "stmia r0!, {r10, r11} \n\t"

#define FAST_MULT_ASM_6             \
    "add r0, 12 \n\t"               \
    "add r2, 12 \n\t"               \
    "ldmia r1!, {r3,r4,r5} \n\t"    \
    "ldmia r2!, {r6,r7,r8} \n\t"    \
                                    \
    "umull r11, r12, r3, r6 \n\t"   \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "mov r10, #0 \n\t"              \
    "umull r11, r9, r3, r7 \n\t"    \
    "adds r12, r12, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r11, r14, r4, r6 \n\t"   \
    "adds r12, r12, r11 \n\t"       \
    "adcs r9, r9, r14 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r5, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r4, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r5, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "umull r9, r10, r5, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adc r12, r12, r10 \n\t"        \
    "stmia r0!, {r11, r12} \n\t"    \
                                    \
    "sub r0, 36 \n\t"               \
    "sub r2, 24 \n\t"               \
    "ldmia r2!, {r6,r7,r8} \n\t"    \
                                    \
    "umull r11, r12, r3, r6 \n\t"   \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "mov r10, #0 \n\t"              \
    "umull r11, r9, r3, r7 \n\t"    \
    "adds r12, r12, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r11, r14, r4, r6 \n\t"   \
    "adds r12, r12, r11 \n\t"       \
    "adcs r9, r9, r14 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r5, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "ldmia r1!, {r3} \n\t"          \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r4, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r5, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r3, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "ldr r14, [r0] \n\t"            \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, #0 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "ldmia r1!, {r4} \n\t"          \
    "mov r14, #0 \n\t"              \
    "umull r9, r10, r5, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r3, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r4, r6 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "ldr r9, [r0] \n\t"             \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, #0 \n\t"        \
    "adc r14, r14, #0 \n\t"         \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "ldmia r1!, {r5} \n\t"          \
    "mov r9, #0 \n\t"               \
    "umull r10, r11, r3, r8 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r4, r7 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r5, r6 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "ldr r10, [r0] \n\t"            \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, #0 \n\t"        \
    "adc r9, r9, #0 \n\t"           \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "ldmia r2!, {r6} \n\t"          \
    "mov r10, #0 \n\t"              \
    "umull r11, r12, r3, r6 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r4, r8 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r5, r7 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "ldr r11, [r0] \n\t"            \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, #0 \n\t"          \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r14} \n\t"         \
                                    \
    "ldmia r2!, {r7} \n\t"          \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r5, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "ldr r12, [r0] \n\t"            \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, #0 \n\t"        \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "ldmia r2!, {r8} \n\t"          \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r3, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r4, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r5, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "ldr r14, [r0] \n\t"            \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, #0 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "mov r14, #0 \n\t"              \
    "umull r9, r10, r4, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r5, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "umull r10, r11, r5, r8 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adc r14, r14, r11 \n\t"        \
    "stmia r0!, {r12, r14} \n\t"

#define FAST_MULT_ASM_7                \
    "add r0, 24 \n\t"                  \
    "add r2, 24 \n\t"                  \
    "ldmia r1!, {r3} \n\t"             \
    "ldmia r2!, {r6} \n\t"             \
                                       \
    "umull r9, r10, r3, r6 \n\t"       \
    "stmia r0!, {r9, r10} \n\t"        \
                                       \
    "sub r0, 20 \n\t"                  \
    "sub r2, 16 \n\t"                  \
    "ldmia r2!, {r6, r7, r8} \n\t"     \
    "ldmia r1!, {r4, r5} \n\t"         \
                                       \
    "umull r9, r10, r3, r6 \n\t"       \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "mov r14, #0 \n\t"                 \
    "umull r9, r12, r3, r7 \n\t"       \
    "adds r10, r10, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r9, r11, r4, r6 \n\t"       \
    "adds r10, r10, r9 \n\t"           \
    "adcs r12, r12, r11 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "mov r9, #0 \n\t"                  \
    "umull r10, r11, r3, r8 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r4, r7 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r5, r6 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "ldmia r1!, {r3} \n\t"             \
    "mov r10, #0 \n\t"                 \
    "umull r11, r12, r4, r8 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r5, r7 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r3, r6 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "ldr r11, [r0] \n\t"               \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, #0 \n\t"             \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r14} \n\t"            \
                                       \
    "ldmia r2!, {r6} \n\t"             \
    "mov r11, #0 \n\t"                 \
    "umull r12, r14, r4, r6 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r5, r8 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r3, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "ldr r12, [r0] \n\t"               \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, #0 \n\t"           \
    "adc r11, r11, #0 \n\t"            \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "mov r12, #0 \n\t"                 \
    "umull r14, r9, r5, r6 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r3, r8 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "umull r9, r10, r3, r6 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adc r12, r12, r10 \n\t"           \
    "stmia r0!, {r11, r12} \n\t"       \
                                       \
    "sub r0, 44 \n\t"                  \
    "sub r1, 16 \n\t"                  \
    "sub r2, 28 \n\t"                  \
    "ldmia r1!, {r3,r4,r5} \n\t"       \
    "ldmia r2!, {r6,r7,r8} \n\t"       \
                                       \
    "umull r9, r10, r3, r6 \n\t"       \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "mov r14, #0 \n\t"                 \
    "umull r9, r12, r3, r7 \n\t"       \
    "adds r10, r10, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r9, r11, r4, r6 \n\t"       \
    "adds r10, r10, r9 \n\t"           \
    "adcs r12, r12, r11 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "mov r9, #0 \n\t"                  \
    "umull r10, r11, r3, r8 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r4, r7 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r5, r6 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "ldmia r1!, {r3} \n\t"             \
    "mov r10, #0 \n\t"                 \
    "umull r11, r12, r4, r8 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r5, r7 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r3, r6 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "ldr r11, [r0] \n\t"               \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, #0 \n\t"             \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r14} \n\t"            \
                                       \
    "ldmia r1!, {r4} \n\t"             \
    "mov r11, #0 \n\t"                 \
    "umull r12, r14, r5, r8 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r3, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r4, r6 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "ldr r12, [r0] \n\t"               \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, #0 \n\t"           \
    "adc r11, r11, #0 \n\t"            \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "ldmia r1!, {r5} \n\t"             \
    "mov r12, #0 \n\t"                 \
    "umull r14, r9, r3, r8 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r4, r7 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r5, r6 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "ldr r14, [r0] \n\t"               \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, #0 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "ldmia r1!, {r3} \n\t"             \
    "mov r14, #0 \n\t"                 \
    "umull r9, r10, r4, r8 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "umull r9, r10, r5, r7 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "umull r9, r10, r3, r6 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "ldr r9, [r0] \n\t"                \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, #0 \n\t"           \
    "adc r14, r14, #0 \n\t"            \
    "stmia r0!, {r11} \n\t"            \
                                       \
    "ldmia r2!, {r6} \n\t"             \
    "mov r9, #0 \n\t"                  \
    "umull r10, r11, r4, r6 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r5, r8 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "umull r10, r11, r3, r7 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, r11 \n\t"          \
    "adc r9, r9, #0 \n\t"              \
    "ldr r10, [r0] \n\t"               \
    "adds r12, r12, r10 \n\t"          \
    "adcs r14, r14, #0 \n\t"           \
    "adc r9, r9, #0 \n\t"              \
    "stmia r0!, {r12} \n\t"            \
                                       \
    "ldmia r2!, {r7} \n\t"             \
    "mov r10, #0 \n\t"                 \
    "umull r11, r12, r4, r7 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r5, r6 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "umull r11, r12, r3, r8 \n\t"      \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, r12 \n\t"            \
    "adc r10, r10, #0 \n\t"            \
    "ldr r11, [r0] \n\t"               \
    "adds r14, r14, r11 \n\t"          \
    "adcs r9, r9, #0 \n\t"             \
    "adc r10, r10, #0 \n\t"            \
    "stmia r0!, {r14} \n\t"            \
                                       \
    "ldmia r2!, {r8} \n\t"             \
    "mov r11, #0 \n\t"                 \
    "umull r12, r14, r4, r8 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r5, r7 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "umull r12, r14, r3, r6 \n\t"      \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, r14 \n\t"          \
    "adc r11, r11, #0 \n\t"            \
    "ldr r12, [r0] \n\t"               \
    "adds r9, r9, r12 \n\t"            \
    "adcs r10, r10, #0 \n\t"           \
    "adc r11, r11, #0 \n\t"            \
    "stmia r0!, {r9} \n\t"             \
                                       \
    "ldmia r2!, {r6} \n\t"             \
    "mov r12, #0 \n\t"                 \
    "umull r14, r9, r4, r6 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r5, r8 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "umull r14, r9, r3, r7 \n\t"       \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, r9 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "ldr r14, [r0] \n\t"               \
    "adds r10, r10, r14 \n\t"          \
    "adcs r11, r11, #0 \n\t"           \
    "adc r12, r12, #0 \n\t"            \
    "stmia r0!, {r10} \n\t"            \
                                       \
    "mov r14, #0 \n\t"                 \
    "umull r9, r10, r5, r6 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "umull r9, r10, r3, r8 \n\t"       \
    "adds r11, r11, r9 \n\t"           \
    "adcs r12, r12, r10 \n\t"          \
    "adc r14, r14, #0 \n\t"            \
    "stmia r0!, {r11} \n\t"            \
                                       \
    "umull r10, r11, r3, r6 \n\t"      \
    "adds r12, r12, r10 \n\t"          \
    "adc r14, r14, r11 \n\t"           \
    "stmia r0!, {r12, r14} \n\t"

#define FAST_MULT_ASM_8             \
    "add r0, 24 \n\t"               \
    "add r2, 24 \n\t"               \
    "ldmia r1!, {r3,r4} \n\t"       \
    "ldmia r2!, {r6,r7} \n\t"       \
                                    \
    "umull r11, r12, r3, r6 \n\t"   \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "mov r10, #0 \n\t"              \
    "umull r11, r9, r3, r7 \n\t"    \
    "adds r12, r12, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r11, r14, r4, r6 \n\t"   \
    "adds r12, r12, r11 \n\t"       \
    "adcs r9, r9, r14 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "umull r12, r14, r4, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adc r10, r10, r14 \n\t"        \
    "stmia r0!, {r9, r10} \n\t"     \
                                    \
    "sub r0, 28 \n\t"               \
    "sub r2, 20 \n\t"               \
    "ldmia r2!, {r6,r7,r8} \n\t"    \
    "ldmia r1!, {r5} \n\t"          \
                                    \
    "umull r11, r12, r3, r6 \n\t"   \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "mov r10, #0 \n\t"              \
    "umull r11, r9, r3, r7 \n\t"    \
    "adds r12, r12, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r11, r14, r4, r6 \n\t"   \
    "adds r12, r12, r11 \n\t"       \
    "adcs r9, r9, r14 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r5, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "ldmia r1!, {r3} \n\t"          \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r4, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r5, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r3, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "ldr r14, [r0] \n\t"            \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, #0 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "ldmia r1!, {r4} \n\t"          \
    "mov r14, #0 \n\t"              \
    "umull r9, r10, r5, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r3, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r4, r6 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "ldr r9, [r0] \n\t"             \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, #0 \n\t"        \
    "adc r14, r14, #0 \n\t"         \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "ldmia r2!, {r6} \n\t"          \
    "mov r9, #0 \n\t"               \
    "umull r10, r11, r5, r6 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r3, r8 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r4, r7 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "ldr r10, [r0] \n\t"            \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, #0 \n\t"        \
    "adc r9, r9, #0 \n\t"           \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "ldmia r2!, {r7} \n\t"          \
    "mov r10, #0 \n\t"              \
    "umull r11, r12, r5, r7 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r3, r6 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r4, r8 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "ldr r11, [r0] \n\t"            \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, #0 \n\t"          \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r14} \n\t"         \
                                    \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "umull r14, r9, r4, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adc r11, r11, r9 \n\t"         \
    "stmia r0!, {r10, r11} \n\t"    \
                                    \
    "sub r0, 52 \n\t"               \
    "sub r1, 20 \n\t"               \
    "sub r2, 32 \n\t"               \
    "ldmia r1!, {r3,r4,r5} \n\t"    \
    "ldmia r2!, {r6,r7,r8} \n\t"    \
                                    \
    "umull r11, r12, r3, r6 \n\t"   \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "mov r10, #0 \n\t"              \
    "umull r11, r9, r3, r7 \n\t"    \
    "adds r12, r12, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r11, r14, r4, r6 \n\t"   \
    "adds r12, r12, r11 \n\t"       \
    "adcs r9, r9, r14 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r3, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r5, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "ldmia r1!, {r3} \n\t"          \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r4, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r5, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r3, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "ldr r14, [r0] \n\t"            \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, #0 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "ldmia r1!, {r4} \n\t"          \
    "mov r14, #0 \n\t"              \
    "umull r9, r10, r5, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r3, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r4, r6 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "ldr r9, [r0] \n\t"             \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, #0 \n\t"        \
    "adc r14, r14, #0 \n\t"         \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "ldmia r1!, {r5} \n\t"          \
    "mov r9, #0 \n\t"               \
    "umull r10, r11, r3, r8 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r4, r7 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r5, r6 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "ldr r10, [r0] \n\t"            \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, #0 \n\t"        \
    "adc r9, r9, #0 \n\t"           \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "ldmia r1!, {r3} \n\t"          \
    "mov r10, #0 \n\t"              \
    "umull r11, r12, r4, r8 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r5, r7 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r3, r6 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "ldr r11, [r0] \n\t"            \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, #0 \n\t"          \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r14} \n\t"         \
                                    \
    "ldmia r1!, {r4} \n\t"          \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r5, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r3, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "ldr r12, [r0] \n\t"            \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, #0 \n\t"        \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "ldmia r2!, {r6} \n\t"          \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r5, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r3, r8 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r4, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "ldr r14, [r0] \n\t"            \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, #0 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "ldmia r2!, {r7} \n\t"          \
    "mov r14, #0 \n\t"              \
    "umull r9, r10, r5, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r3, r6 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "umull r9, r10, r4, r8 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, r10 \n\t"       \
    "adc r14, r14, #0 \n\t"         \
    "ldr r9, [r0] \n\t"             \
    "adds r11, r11, r9 \n\t"        \
    "adcs r12, r12, #0 \n\t"        \
    "adc r14, r14, #0 \n\t"         \
    "stmia r0!, {r11} \n\t"         \
                                    \
    "ldmia r2!, {r8} \n\t"          \
    "mov r9, #0 \n\t"               \
    "umull r10, r11, r5, r8 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r3, r7 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "umull r10, r11, r4, r6 \n\t"   \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, r11 \n\t"       \
    "adc r9, r9, #0 \n\t"           \
    "ldr r10, [r0] \n\t"            \
    "adds r12, r12, r10 \n\t"       \
    "adcs r14, r14, #0 \n\t"        \
    "adc r9, r9, #0 \n\t"           \
    "stmia r0!, {r12} \n\t"         \
                                    \
    "ldmia r2!, {r6} \n\t"          \
    "mov r10, #0 \n\t"              \
    "umull r11, r12, r5, r6 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r3, r8 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "umull r11, r12, r4, r7 \n\t"   \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, r12 \n\t"         \
    "adc r10, r10, #0 \n\t"         \
    "ldr r11, [r0] \n\t"            \
    "adds r14, r14, r11 \n\t"       \
    "adcs r9, r9, #0 \n\t"          \
    "adc r10, r10, #0 \n\t"         \
    "stmia r0!, {r14} \n\t"         \
                                    \
    "ldmia r2!, {r7} \n\t"          \
    "mov r11, #0 \n\t"              \
    "umull r12, r14, r5, r7 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r3, r6 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "umull r12, r14, r4, r8 \n\t"   \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, r14 \n\t"       \
    "adc r11, r11, #0 \n\t"         \
    "ldr r12, [r0] \n\t"            \
    "adds r9, r9, r12 \n\t"         \
    "adcs r10, r10, #0 \n\t"        \
    "adc r11, r11, #0 \n\t"         \
    "stmia r0!, {r9} \n\t"          \
                                    \
    "mov r12, #0 \n\t"              \
    "umull r14, r9, r3, r7 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "umull r14, r9, r4, r6 \n\t"    \
    "adds r10, r10, r14 \n\t"       \
    "adcs r11, r11, r9 \n\t"        \
    "adc r12, r12, #0 \n\t"         \
    "stmia r0!, {r10} \n\t"         \
                                    \
    "umull r9, r10, r4, r7 \n\t"    \
    "adds r11, r11, r9 \n\t"        \
    "adc r12, r12, r10 \n\t"        \
    "stmia r0!, {r11, r12} \n\t"

#define FAST_SQUARE_ASM_5               \
    "ldmia r1!, {r2,r3,r4,r5,r6} \n\t"  \
                                        \
    "umull r11, r12, r2, r2 \n\t"       \
    "stmia r0!, {r11} \n\t"             \
                                        \
    "mov r9, #0 \n\t"                   \
    "umull r10, r11, r2, r3 \n\t"       \
    "adds r12, r12, r10 \n\t"           \
    "adcs r8, r11, #0 \n\t"             \
    "adc r9, r9, #0 \n\t"               \
    "adds r12, r12, r10 \n\t"           \
    "adcs r8, r8, r11 \n\t"             \
    "adc r9, r9, #0 \n\t"               \
    "stmia r0!, {r12} \n\t"             \
                                        \
    "mov r10, #0 \n\t"                  \
    "umull r11, r12, r2, r4 \n\t"       \
    "adds r11, r11, r11 \n\t"           \
    "adcs r12, r12, r12 \n\t"           \
    "adc r10, r10, #0 \n\t"             \
    "adds r8, r8, r11 \n\t"             \
    "adcs r9, r9, r12 \n\t"             \
    "adc r10, r10, #0 \n\t"             \
    "umull r11, r12, r3, r3 \n\t"       \
    "adds r8, r8, r11 \n\t"             \
    "adcs r9, r9, r12 \n\t"             \
    "adc r10, r10, #0 \n\t"             \
    "stmia r0!, {r8} \n\t"              \
                                        \
    "mov r12, #0 \n\t"                  \
    "umull r8, r11, r2, r5 \n\t"        \
    "umull r1, r14, r3, r4 \n\t"        \
    "adds r8, r8, r1 \n\t"              \
    "adcs r11, r11, r14 \n\t"           \
    "adc r12, r12, #0 \n\t"             \
    "adds r8, r8, r8 \n\t"              \
    "adcs r11, r11, r11 \n\t"           \
    "adc r12, r12, r12 \n\t"            \
    "adds r8, r8, r9 \n\t"              \
    "adcs r11, r11, r10 \n\t"           \
    "adc r12, r12, #0 \n\t"             \
    "stmia r0!, {r8} \n\t"              \
                                        \
    "mov r10, #0 \n\t"                  \
    "umull r8, r9, r2, r6 \n\t"         \
    "umull r1, r14, r3, r5 \n\t"        \
    "adds r8, r8, r1 \n\t"              \
    "adcs r9, r9, r14 \n\t"             \
    "adc r10, r10, #0 \n\t"             \
    "adds r8, r8, r8 \n\t"              \
    "adcs r9, r9, r9 \n\t"              \
    "adc r10, r10, r10 \n\t"            \
    "umull r1, r14, r4, r4 \n\t"        \
    "adds r8, r8, r1 \n\t"              \
    "adcs r9, r9, r14 \n\t"             \
    "adc r10, r10, #0 \n\t"             \
    "adds r8, r8, r11 \n\t"             \
    "adcs r9, r9, r12 \n\t"             \
    "adc r10, r10, #0 \n\t"             \
    "stmia r0!, {r8} \n\t"              \
                                        \
    "mov r12, #0 \n\t"                  \
    "umull r8, r11, r3, r6 \n\t"        \
    "umull r1, r14, r4, r5 \n\t"        \
    "adds r8, r8, r1 \n\t"              \
    "adcs r11, r11, r14 \n\t"           \
    "adc r12, r12, #0 \n\t"             \
    "adds r8, r8, r8 \n\t"              \
    "adcs r11, r11, r11 \n\t"           \
    "adc r12, r12, r12 \n\t"            \
    "adds r8, r8, r9 \n\t"              \
    "adcs r11, r11, r10 \n\t"           \
    "adc r12, r12, #0 \n\t"             \
    "stmia r0!, {r8} \n\t"              \
                                        \
    "mov r8, #0 \n\t"                   \
    "umull r1, r10, r4, r6 \n\t"        \
    "adds r1, r1, r1 \n\t"              \
    "adcs r10, r10, r10 \n\t"           \
    "adc r8, r8, #0 \n\t"               \
    "adds r11, r11, r1 \n\t"            \
    "adcs r12, r12, r10 \n\t"           \
    "adc r8, r8, #0 \n\t"               \
    "umull r1, r10, r5, r5 \n\t"        \
    "adds r11, r11, r1 \n\t"            \
    "adcs r12, r12, r10 \n\t"           \
    "adc r8, r8, #0 \n\t"               \
    "stmia r0!, {r11} \n\t"             \
                                        \
    "mov r11, #0 \n\t"                  \
    "umull r1, r10, r5, r6 \n\t"        \
    "adds r1, r1, r1 \n\t"              \
    "adcs r10, r10, r10 \n\t"           \
    "adc r11, r11, #0 \n\t"             \
    "adds r12, r12, r1 \n\t"            \
    "adcs r8, r8, r10 \n\t"             \
    "adc r11, r11, #0 \n\t"             \
    "stmia r0!, {r12} \n\t"             \
                                        \
    "umull r1, r10, r6, r6 \n\t"        \
    "adds r8, r8, r1 \n\t"              \
    "adcs r11, r11, r10 \n\t"           \
    "stmia r0!, {r8, r11} \n\t"

#define FAST_SQUARE_ASM_6                  \
    "ldmia r1!, {r2,r3,r4,r5,r6,r7} \n\t"  \
                                           \
    "umull r11, r12, r2, r2 \n\t"          \
    "stmia r0!, {r11} \n\t"                \
                                           \
    "mov r9, #0 \n\t"                      \
    "umull r10, r11, r2, r3 \n\t"          \
    "adds r12, r12, r10 \n\t"              \
    "adcs r8, r11, #0 \n\t"                \
    "adc r9, r9, #0 \n\t"                  \
    "adds r12, r12, r10 \n\t"              \
    "adcs r8, r8, r11 \n\t"                \
    "adc r9, r9, #0 \n\t"                  \
    "stmia r0!, {r12} \n\t"                \
                                           \
    "mov r10, #0 \n\t"                     \
    "umull r11, r12, r2, r4 \n\t"          \
    "adds r11, r11, r11 \n\t"              \
    "adcs r12, r12, r12 \n\t"              \
    "adc r10, r10, #0 \n\t"                \
    "adds r8, r8, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "umull r11, r12, r3, r3 \n\t"          \
    "adds r8, r8, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r12, #0 \n\t"                     \
    "umull r8, r11, r2, r5 \n\t"           \
    "umull r1, r14, r3, r4 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r11, r11, r14 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "adds r8, r8, r8 \n\t"                 \
    "adcs r11, r11, r11 \n\t"              \
    "adc r12, r12, r12 \n\t"               \
    "adds r8, r8, r9 \n\t"                 \
    "adcs r11, r11, r10 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r10, #0 \n\t"                     \
    "umull r8, r9, r2, r6 \n\t"            \
    "umull r1, r14, r3, r5 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r9, r9, r14 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "adds r8, r8, r8 \n\t"                 \
    "adcs r9, r9, r9 \n\t"                 \
    "adc r10, r10, r10 \n\t"               \
    "umull r1, r14, r4, r4 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r9, r9, r14 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "adds r8, r8, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r12, #0 \n\t"                     \
    "umull r8, r11, r2, r7 \n\t"           \
    "umull r1, r14, r3, r6 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r11, r11, r14 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "umull r1, r14, r4, r5 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r11, r11, r14 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "adds r8, r8, r8 \n\t"                 \
    "adcs r11, r11, r11 \n\t"              \
    "adc r12, r12, r12 \n\t"               \
    "adds r8, r8, r9 \n\t"                 \
    "adcs r11, r11, r10 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r10, #0 \n\t"                     \
    "umull r8, r9, r3, r7 \n\t"            \
    "umull r1, r14, r4, r6 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r9, r9, r14 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "adds r8, r8, r8 \n\t"                 \
    "adcs r9, r9, r9 \n\t"                 \
    "adc r10, r10, r10 \n\t"               \
    "umull r1, r14, r5, r5 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r9, r9, r14 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "adds r8, r8, r11 \n\t"                \
    "adcs r9, r9, r12 \n\t"                \
    "adc r10, r10, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r12, #0 \n\t"                     \
    "umull r8, r11, r4, r7 \n\t"           \
    "umull r1, r14, r5, r6 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r11, r11, r14 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "adds r8, r8, r8 \n\t"                 \
    "adcs r11, r11, r11 \n\t"              \
    "adc r12, r12, r12 \n\t"               \
    "adds r8, r8, r9 \n\t"                 \
    "adcs r11, r11, r10 \n\t"              \
    "adc r12, r12, #0 \n\t"                \
    "stmia r0!, {r8} \n\t"                 \
                                           \
    "mov r8, #0 \n\t"                      \
    "umull r1, r10, r5, r7 \n\t"           \
    "adds r1, r1, r1 \n\t"                 \
    "adcs r10, r10, r10 \n\t"              \
    "adc r8, r8, #0 \n\t"                  \
    "adds r11, r11, r1 \n\t"               \
    "adcs r12, r12, r10 \n\t"              \
    "adc r8, r8, #0 \n\t"                  \
    "umull r1, r10, r6, r6 \n\t"           \
    "adds r11, r11, r1 \n\t"               \
    "adcs r12, r12, r10 \n\t"              \
    "adc r8, r8, #0 \n\t"                  \
    "stmia r0!, {r11} \n\t"                \
                                           \
    "mov r11, #0 \n\t"                     \
    "umull r1, r10, r6, r7 \n\t"           \
    "adds r1, r1, r1 \n\t"                 \
    "adcs r10, r10, r10 \n\t"              \
    "adc r11, r11, #0 \n\t"                \
    "adds r12, r12, r1 \n\t"               \
    "adcs r8, r8, r10 \n\t"                \
    "adc r11, r11, #0 \n\t"                \
    "stmia r0!, {r12} \n\t"                \
                                           \
    "umull r1, r10, r7, r7 \n\t"           \
    "adds r8, r8, r1 \n\t"                 \
    "adcs r11, r11, r10 \n\t"              \
    "stmia r0!, {r8, r11} \n\t"

#define FAST_SQUARE_ASM_7                      \
    "ldmia r1!, {r2} \n\t"                     \
    "add r1, 20 \n\t"                          \
    "ldmia r1!, {r5} \n\t"                     \
    "add r0, 24 \n\t"                          \
    "umull r8, r9, r2, r5 \n\t"                \
    "stmia r0!, {r8, r9} \n\t"                 \
    "sub r0, 32 \n\t"                          \
    "sub r1, 28 \n\t"                          \
                                               \
    "ldmia r1!, {r2, r3, r4, r5, r6, r7} \n\t" \
                                               \
    "umull r11, r12, r2, r2 \n\t"              \
    "stmia r0!, {r11} \n\t"                    \
                                               \
    "mov r9, #0 \n\t"                          \
    "umull r10, r11, r2, r3 \n\t"              \
    "adds r12, r12, r10 \n\t"                  \
    "adcs r8, r11, #0 \n\t"                    \
    "adc r9, r9, #0 \n\t"                      \
    "adds r12, r12, r10 \n\t"                  \
    "adcs r8, r8, r11 \n\t"                    \
    "adc r9, r9, #0 \n\t"                      \
    "stmia r0!, {r12} \n\t"                    \
                                               \
    "mov r10, #0 \n\t"                         \
    "umull r11, r12, r2, r4 \n\t"              \
    "adds r11, r11, r11 \n\t"                  \
    "adcs r12, r12, r12 \n\t"                  \
    "adc r10, r10, #0 \n\t"                    \
    "adds r8, r8, r11 \n\t"                    \
    "adcs r9, r9, r12 \n\t"                    \
    "adc r10, r10, #0 \n\t"                    \
    "umull r11, r12, r3, r3 \n\t"              \
    "adds r8, r8, r11 \n\t"                    \
    "adcs r9, r9, r12 \n\t"                    \
    "adc r10, r10, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r12, #0 \n\t"                         \
    "umull r8, r11, r2, r5 \n\t"               \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r3, r4 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r11, r11, r11 \n\t"                  \
    "adc r12, r12, r12 \n\t"                   \
    "adds r8, r8, r9 \n\t"                     \
    "adcs r11, r11, r10 \n\t"                  \
    "adc r12, r12, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r10, #0 \n\t"                         \
    "umull r8, r9, r2, r6 \n\t"                \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r3, r5 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r9, r9, r9 \n\t"                     \
    "adc r10, r10, r10 \n\t"                   \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r4, r4 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "adds r8, r8, r11 \n\t"                    \
    "adcs r9, r9, r12 \n\t"                    \
    "adc r10, r10, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r12, #0 \n\t"                         \
    "umull r8, r11, r2, r7 \n\t"               \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r3, r6 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r4, r5 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r11, r11, r11 \n\t"                  \
    "adc r12, r12, r12 \n\t"                   \
    "adds r8, r8, r9 \n\t"                     \
    "adcs r11, r11, r10 \n\t"                  \
    "adc r12, r12, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "ldmia r1!, {r2} \n\t"                     \
    "mov r10, #0 \n\t"                         \
    "umull r8, r9, r3, r7 \n\t"                \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r4, r6 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "ldr r14, [r0] \n\t"                       \
    "adds r8, r8, r14 \n\t"                    \
    "adcs r9, r9, #0 \n\t"                     \
    "adc r10, r10, #0 \n\t"                    \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r9, r9, r9 \n\t"                     \
    "adc r10, r10, r10 \n\t"                   \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r5, r5 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "adds r8, r8, r11 \n\t"                    \
    "adcs r9, r9, r12 \n\t"                    \
    "adc r10, r10, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r12, #0 \n\t"                         \
    "umull r8, r11, r3, r2 \n\t"               \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r4, r7 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r5, r6 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "ldr r14, [r0] \n\t"                       \
    "adds r8, r8, r14 \n\t"                    \
    "adcs r11, r11, #0 \n\t"                   \
    "adc r12, r12, #0 \n\t"                    \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r11, r11, r11 \n\t"                  \
    "adc r12, r12, r12 \n\t"                   \
    "adds r8, r8, r9 \n\t"                     \
    "adcs r11, r11, r10 \n\t"                  \
    "adc r12, r12, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r10, #0 \n\t"                         \
    "umull r8, r9, r4, r2 \n\t"                \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r5, r7 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r9, r9, r9 \n\t"                     \
    "adc r10, r10, r10 \n\t"                   \
    "mov r14, r9 \n\t"                         \
    "umlal r8, r9, r6, r6 \n\t"                \
    "cmp r14, r9 \n\t"                         \
    "it hi \n\t"                               \
    "adchi r10, r10, #0 \n\t"                  \
    "adds r8, r8, r11 \n\t"                    \
    "adcs r9, r9, r12 \n\t"                    \
    "adc r10, r10, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r12, #0 \n\t"                         \
    "umull r8, r11, r5, r2 \n\t"               \
    "mov r14, r11 \n\t"                        \
    "umlal r8, r11, r6, r7 \n\t"               \
    "cmp r14, r11 \n\t"                        \
    "it hi \n\t"                               \
    "adchi r12, r12, #0 \n\t"                  \
    "adds r8, r8, r8 \n\t"                     \
    "adcs r11, r11, r11 \n\t"                  \
    "adc r12, r12, r12 \n\t"                   \
    "adds r8, r8, r9 \n\t"                     \
    "adcs r11, r11, r10 \n\t"                  \
    "adc r12, r12, #0 \n\t"                    \
    "stmia r0!, {r8} \n\t"                     \
                                               \
    "mov r8, #0 \n\t"                          \
    "umull r1, r10, r6, r2 \n\t"               \
    "adds r1, r1, r1 \n\t"                     \
    "adcs r10, r10, r10 \n\t"                  \
    "adc r8, r8, #0 \n\t"                      \
    "adds r11, r11, r1 \n\t"                   \
    "adcs r12, r12, r10 \n\t"                  \
    "adc r8, r8, #0 \n\t"                      \
    "umull r1, r10, r7, r7 \n\t"               \
    "adds r11, r11, r1 \n\t"                   \
    "adcs r12, r12, r10 \n\t"                  \
    "adc r8, r8, #0 \n\t"                      \
    "stmia r0!, {r11} \n\t"                    \
                                               \
    "mov r11, #0 \n\t"                         \
    "umull r1, r10, r7, r2 \n\t"               \
    "adds r1, r1, r1 \n\t"                     \
    "adcs r10, r10, r10 \n\t"                  \
    "adc r11, r11, #0 \n\t"                    \
    "adds r12, r12, r1 \n\t"                   \
    "adcs r8, r8, r10 \n\t"                    \
    "adc r11, r11, #0 \n\t"                    \
    "stmia r0!, {r12} \n\t"                    \
                                               \
    "umull r1, r10, r2, r2 \n\t"               \
    "adds r8, r8, r1 \n\t"                     \
    "adcs r11, r11, r10 \n\t"                  \
    "stmia r0!, {r8, r11} \n\t"

#define FAST_SQUARE_ASM_8                   \
    "ldmia r1!, {r2, r3} \n\t"              \
    "add r1, 16 \n\t"                       \
    "ldmia r1!, {r5, r6} \n\t"              \
    "add r0, 24 \n\t"                       \
                                            \
    "umull r8, r9, r2, r5 \n\t"             \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "umull r12, r10, r2, r6 \n\t"           \
    "adds r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r9} \n\t"                  \
                                            \
    "umull r8, r9, r3, r6 \n\t"             \
    "adds r10, r10, r8 \n\t"                \
    "adc r11, r9, #0 \n\t"                  \
    "stmia r0!, {r10, r11} \n\t"            \
                                            \
    "sub r0, 40 \n\t"                       \
    "sub r1, 32 \n\t"                       \
    "ldmia r1!, {r2,r3,r4,r5,r6,r7} \n\t"   \
                                            \
    "umull r11, r12, r2, r2 \n\t"           \
    "stmia r0!, {r11} \n\t"                 \
                                            \
    "mov r9, #0 \n\t"                       \
    "umull r10, r11, r2, r3 \n\t"           \
    "adds r12, r12, r10 \n\t"               \
    "adcs r8, r11, #0 \n\t"                 \
    "adc r9, r9, #0 \n\t"                   \
    "adds r12, r12, r10 \n\t"               \
    "adcs r8, r8, r11 \n\t"                 \
    "adc r9, r9, #0 \n\t"                   \
    "stmia r0!, {r12} \n\t"                 \
                                            \
    "mov r10, #0 \n\t"                      \
    "umull r11, r12, r2, r4 \n\t"           \
    "adds r11, r11, r11 \n\t"               \
    "adcs r12, r12, r12 \n\t"               \
    "adc r10, r10, #0 \n\t"                 \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "umull r11, r12, r3, r3 \n\t"           \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r12, #0 \n\t"                      \
    "umull r8, r11, r2, r5 \n\t"            \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r3, r4 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r11, r11, r11 \n\t"               \
    "adc r12, r12, r12 \n\t"                \
    "adds r8, r8, r9 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "adc r12, r12, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r10, #0 \n\t"                      \
    "umull r8, r9, r2, r6 \n\t"             \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r3, r5 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r9, r9, r9 \n\t"                  \
    "adc r10, r10, r10 \n\t"                \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r4, r4 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r12, #0 \n\t"                      \
    "umull r8, r11, r2, r7 \n\t"            \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r3, r6 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r4, r5 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r11, r11, r11 \n\t"               \
    "adc r12, r12, r12 \n\t"                \
    "adds r8, r8, r9 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "adc r12, r12, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "ldmia r1!, {r2} \n\t"                  \
    "mov r10, #0 \n\t"                      \
    "umull r8, r9, r3, r7 \n\t"             \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r4, r6 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "ldr r14, [r0] \n\t"                    \
    "adds r8, r8, r14 \n\t"                 \
    "adcs r9, r9, #0 \n\t"                  \
    "adc r10, r10, #0 \n\t"                 \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r9, r9, r9 \n\t"                  \
    "adc r10, r10, r10 \n\t"                \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r5, r5 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r12, #0 \n\t"                      \
    "umull r8, r11, r3, r2 \n\t"            \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r4, r7 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r5, r6 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "ldr r14, [r0] \n\t"                    \
    "adds r8, r8, r14 \n\t"                 \
    "adcs r11, r11, #0 \n\t"                \
    "adc r12, r12, #0 \n\t"                 \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r11, r11, r11 \n\t"               \
    "adc r12, r12, r12 \n\t"                \
    "adds r8, r8, r9 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "adc r12, r12, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "ldmia r1!, {r3} \n\t"                  \
    "mov r10, #0 \n\t"                      \
    "umull r8, r9, r4, r2 \n\t"             \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r5, r7 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "ldr r14, [r0] \n\t"                    \
    "adds r8, r8, r14 \n\t"                 \
    "adcs r9, r9, #0 \n\t"                  \
    "adc r10, r10, #0 \n\t"                 \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r9, r9, r9 \n\t"                  \
    "adc r10, r10, r10 \n\t"                \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r6, r6 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r12, #0 \n\t"                      \
    "umull r8, r11, r4, r3 \n\t"            \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r5, r2 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r6, r7 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "ldr r14, [r0] \n\t"                    \
    "adds r8, r8, r14 \n\t"                 \
    "adcs r11, r11, #0 \n\t"                \
    "adc r12, r12, #0 \n\t"                 \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r11, r11, r11 \n\t"               \
    "adc r12, r12, r12 \n\t"                \
    "adds r8, r8, r9 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "adc r12, r12, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r10, #0 \n\t"                      \
    "umull r8, r9, r5, r3 \n\t"             \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r6, r2 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r9, r9, r9 \n\t"                  \
    "adc r10, r10, r10 \n\t"                \
    "mov r14, r9 \n\t"                      \
    "umlal r8, r9, r7, r7 \n\t"             \
    "cmp r14, r9 \n\t"                      \
    "it hi \n\t"                            \
    "adchi r10, r10, #0 \n\t"               \
    "adds r8, r8, r11 \n\t"                 \
    "adcs r9, r9, r12 \n\t"                 \
    "adc r10, r10, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r12, #0 \n\t"                      \
    "umull r8, r11, r6, r3 \n\t"            \
    "mov r14, r11 \n\t"                     \
    "umlal r8, r11, r7, r2 \n\t"            \
    "cmp r14, r11 \n\t"                     \
    "it hi \n\t"                            \
    "adchi r12, r12, #0 \n\t"               \
    "adds r8, r8, r8 \n\t"                  \
    "adcs r11, r11, r11 \n\t"               \
    "adc r12, r12, r12 \n\t"                \
    "adds r8, r8, r9 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "adc r12, r12, #0 \n\t"                 \
    "stmia r0!, {r8} \n\t"                  \
                                            \
    "mov r8, #0 \n\t"                       \
    "umull r1, r10, r7, r3 \n\t"            \
    "adds r1, r1, r1 \n\t"                  \
    "adcs r10, r10, r10 \n\t"               \
    "adc r8, r8, #0 \n\t"                   \
    "adds r11, r11, r1 \n\t"                \
    "adcs r12, r12, r10 \n\t"               \
    "adc r8, r8, #0 \n\t"                   \
    "umull r1, r10, r2, r2 \n\t"            \
    "adds r11, r11, r1 \n\t"                \
    "adcs r12, r12, r10 \n\t"               \
    "adc r8, r8, #0 \n\t"                   \
    "stmia r0!, {r11} \n\t"                 \
                                            \
    "mov r11, #0 \n\t"                      \
    "umull r1, r10, r2, r3 \n\t"            \
    "adds r1, r1, r1 \n\t"                  \
    "adcs r10, r10, r10 \n\t"               \
    "adc r11, r11, #0 \n\t"                 \
    "adds r12, r12, r1 \n\t"                \
    "adcs r8, r8, r10 \n\t"                 \
    "adc r11, r11, #0 \n\t"                 \
    "stmia r0!, {r12} \n\t"                 \
                                            \
    "umull r1, r10, r3, r3 \n\t"            \
    "adds r8, r8, r1 \n\t"                  \
    "adcs r11, r11, r10 \n\t"               \
    "stmia r0!, {r8, r11} \n\t"

#endif /* _UECC_ASM_ARM_MULT_SQUARE_H_ */


================================================
FILE: u2f/curve-specific.h
================================================
/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */

#ifndef _UECC_CURVE_SPECIFIC_H_
#define _UECC_CURVE_SPECIFIC_H_

#define num_bytes_secp160r1 20
#define num_bytes_secp192r1 24
#define num_bytes_secp224r1 28
#define num_bytes_secp256r1 32
#define num_bytes_secp256k1 32

#if (uECC_WORD_SIZE == 1)

#define num_words_secp160r1 20
#define num_words_secp192r1 24
#define num_words_secp224r1 28
#define num_words_secp256r1 32
#define num_words_secp256k1 32

#define BYTES_TO_WORDS_8(a, b, c, d, e, f, g, h) \
    0x##a, 0x##b, 0x##c, 0x##d, 0x##e, 0x##f, 0x##g, 0x##h
#define BYTES_TO_WORDS_4(a, b, c, d) 0x##a, 0x##b, 0x##c, 0x##d

#elif (uECC_WORD_SIZE == 4)

#define num_words_secp160r1 5
#define num_words_secp192r1 6
#define num_words_secp224r1 7
#define num_words_secp256r1 8
#define num_words_secp256k1 8

#define BYTES_TO_WORDS_8(a, b, c, d, e, f, g, h) 0x##d##c##b##a, 0x##h##g##f##e
#define BYTES_TO_WORDS_4(a, b, c, d) 0x##d##c##b##a

#elif (uECC_WORD_SIZE == 8)

#define num_words_secp160r1 3
#define num_words_secp192r1 3
#define num_words_secp224r1 4
#define num_words_secp256r1 4
#define num_words_secp256k1 4

#define BYTES_TO_WORDS_8(a, b, c, d, e, f, g, h) 0x##h##g##f##e##d##c##b##a##ull
#define BYTES_TO_WORDS_4(a, b, c, d) 0x##d##c##b##a##ull

#endif /* uECC_WORD_SIZE */

#if uECC_SUPPORTS_secp160r1 || uECC_SUPPORTS_secp192r1 || \
    uECC_SUPPORTS_secp224r1 || uECC_SUPPORTS_secp256r1
static void double_jacobian_default(uECC_word_t * X1,
                                    uECC_word_t * Y1,
                                    uECC_word_t * Z1,
                                    uECC_Curve curve) {
    /* t1 = X, t2 = Y, t3 = Z */
    uECC_word_t t4[uECC_MAX_WORDS];
    uECC_word_t t5[uECC_MAX_WORDS];
    wordcount_t num_words = curve->num_words;

    if (uECC_vli_isZero(Z1, num_words)) {
        return;
    }

    uECC_vli_modSquare_fast(t4, Y1, curve);   /* t4 = y1^2 */
    uECC_vli_modMult_fast(t5, X1, t4, curve); /* t5 = x1*y1^2 = A */
    uECC_vli_modSquare_fast(t4, t4, curve);   /* t4 = y1^4 */
    uECC_vli_modMult_fast(Y1, Y1, Z1, curve); /* t2 = y1*z1 = z3 */
    uECC_vli_modSquare_fast(Z1, Z1, curve);   /* t3 = z1^2 */

    uECC_vli_modAdd(X1, X1, Z1, curve->p, num_words); /* t1 = x1 + z1^2 */
    uECC_vli_modAdd(Z1, Z1, Z1, curve->p, num_words); /* t3 = 2*z1^2 */
    uECC_vli_modSub(Z1, X1, Z1, curve->p, num_words); /* t3 = x1 - z1^2 */
    uECC_vli_modMult_fast(X1, X1, Z1, curve);                /* t1 = x1^2 - z1^4 */

    uECC_vli_modAdd(Z1, X1, X1, curve->p, num_words); /* t3 = 2*(x1^2 - z1^4) */
    uECC_vli_modAdd(X1, X1, Z1, curve->p, num_words); /* t1 = 3*(x1^2 - z1^4) */
    if (uECC_vli_testBit(X1, 0)) {
        uECC_word_t l_carry = uECC_vli_add(X1, X1, curve->p, num_words);
        uECC_vli_rshift1(X1, num_words);
        X1[num_words - 1] |= l_carry << (uECC_WORD_BITS - 1);
    } else {
        uECC_vli_rshift1(X1, num_words);
    }
    /* t1 = 3/2*(x1^2 - z1^4) = B */

    uECC_vli_modSquare_fast(Z1, X1, curve);                  /* t3 = B^2 */
    uECC_vli_modSub(Z1, Z1, t5, curve->p, num_words); /* t3 = B^2 - A */
    uECC_vli_modSub(Z1, Z1, t5, curve->p, num_words); /* t3 = B^2 - 2A = x3 */
    uECC_vli_modSub(t5, t5, Z1, curve->p, num_words); /* t5 = A - x3 */
    uECC_vli_modMult_fast(X1, X1, t5, curve);                /* t1 = B * (A - x3) */
    uECC_vli_modSub(t4, X1, t4, curve->p, num_words); /* t4 = B * (A - x3) - y1^4 = y3 */

    uECC_vli_set(X1, Z1, num_words);
    uECC_vli_set(Z1, Y1, num_words);
    uECC_vli_set(Y1, t4, num_words);
}

/* Computes result = x^3 + ax + b. result must not overlap x. */
static void x_side_default(uECC_word_t *result, const uECC_word_t *x, uECC_Curve curve) {
    uECC_word_t _3[uECC_MAX_WORDS] = {3}; /* -a = 3 */
    wordcount_t num_words = curve->num_words;

    uECC_vli_modSquare_fast(result, x, curve);                             /* r = x^2 */
    uECC_vli_modSub(result, result, _3, curve->p, num_words);       /* r = x^2 - 3 */
    uECC_vli_modMult_fast(result, result, x, curve);                       /* r = x^3 - 3x */
    uECC_vli_modAdd(result, result, curve->b, curve->p, num_words); /* r = x^3 - 3x + b */
}
#endif /* uECC_SUPPORTS_secp... */

#if uECC_SUPPORT_COMPRESSED_POINT
#if uECC_SUPPORTS_secp160r1 || uECC_SUPPORTS_secp192r1 || \
    uECC_SUPPORTS_secp256r1 || uECC_SUPPORTS_secp256k1
/* Compute a = sqrt(a) (mod curve_p). */
static void mod_sqrt_default(uECC_word_t *a, uECC_Curve curve) {
    bitcount_t i;
    uECC_word_t p1[uECC_MAX_WORDS] = {1};
    uECC_word_t l_result[uECC_MAX_WORDS] = {1};
    wordcount_t num_words = curve->num_words;
    
    /* When curve->p == 3 (mod 4), we can compute
       sqrt(a) = a^((curve->p + 1) / 4) (mod curve->p). */
    uECC_vli_add(p1, curve->p, p1, num_words); /* p1 = curve_p + 1 */
    for (i = uECC_vli_numBits(p1, num_words) - 1; i > 1; --i) {
        uECC_vli_modSquare_fast(l_result, l_result, curve);
        if (uECC_vli_testBit(p1, i)) {
            uECC_vli_modMult_fast(l_result, l_result, a, curve);
        }
    }
    uECC_vli_set(a, l_result, num_words);
}
#endif /* uECC_SUPPORTS_secp... */
#endif /* uECC_SUPPORT_COMPRESSED_POINT */

#if uECC_SUPPORTS_secp160r1

#if (uECC_OPTIMIZATION_LEVEL > 0)
static void vli_mmod_fast_secp160r1(uECC_word_t *result, uECC_word_t *product);
#endif

static const struct uECC_Curve_t curve_secp160r1 = {
    num_words_secp160r1,
    num_bytes_secp160r1,
    161, /* num_n_bits */
    { BYTES_TO_WORDS_8(FF, FF, FF, 7F, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_4(FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(57, 22, 75, CA, D3, AE, 27, F9),
        BYTES_TO_WORDS_8(C8, F4, 01, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, 01, 00, 00, 00) },
    { BYTES_TO_WORDS_8(82, FC, CB, 13, B9, 8B, C3, 68),
        BYTES_TO_WORDS_8(89, 69, 64, 46, 28, 73, F5, 8E),
        BYTES_TO_WORDS_4(68, B5, 96, 4A),

        BYTES_TO_WORDS_8(32, FB, C5, 7A, 37, 51, 23, 04),
        BYTES_TO_WORDS_8(12, C9, DC, 59, 7D, 94, 68, 31),
        BYTES_TO_WORDS_4(55, 28, A6, 23) },
    { BYTES_TO_WORDS_8(45, FA, 65, C5, AD, D4, D4, 81),
        BYTES_TO_WORDS_8(9F, F8, AC, 65, 8B, 7A, BD, 54),
        BYTES_TO_WORDS_4(FC, BE, 97, 1C) },
    &double_jacobian_default,
#if uECC_SUPPORT_COMPRESSED_POINT
    &mod_sqrt_default,
#endif
    &x_side_default,
#if (uECC_OPTIMIZATION_LEVEL > 0)
    &vli_mmod_fast_secp160r1
#endif
};

uECC_Curve uECC_secp160r1(void) { return &curve_secp160r1; }

#if (uECC_OPTIMIZATION_LEVEL > 0 && !asm_mmod_fast_secp160r1)
/* Computes result = product % curve_p
    see http://www.isys.uni-klu.ac.at/PDF/2001-0126-MT.pdf page 354
    
    Note that this only works if log2(omega) < log2(p) / 2 */
static void omega_mult_secp160r1(uECC_word_t *result, const uECC_word_t *right);
#if uECC_WORD_SIZE == 8
static void vli_mmod_fast_secp160r1(uECC_word_t *result, uECC_word_t *product) {
    uECC_word_t tmp[2 * num_words_secp160r1];
    uECC_word_t copy;
    
    uECC_vli_clear(tmp, num_words_secp160r1);
    uECC_vli_clear(tmp + num_words_secp160r1, num_words_secp160r1);

    omega_mult_secp160r1(tmp, product + num_words_secp160r1 - 1); /* (Rq, q) = q * c */
    
    product[num_words_secp160r1 - 1] &= 0xffffffff;
    copy = tmp[num_words_secp160r1 - 1];
    tmp[num_words_secp160r1 - 1] &= 0xffffffff;
    uECC_vli_add(result, product, tmp, num_words_secp160r1); /* (C, r) = r + q */
    uECC_vli_clear(product, num_words_secp160r1);
    tmp[num_words_secp160r1 - 1] = copy;
    omega_mult_secp160r1(product, tmp + num_words_secp160r1 - 1); /* Rq*c */
    uECC_vli_add(result, result, product, num_words_secp160r1); /* (C1, r) = r + Rq*c */

    while (uECC_vli_cmp_unsafe(result, curve_secp160r1.p, num_words_secp160r1) > 0) {
        uECC_vli_sub(result, result, curve_secp160r1.p, num_words_secp160r1);
    }
}

static void omega_mult_secp160r1(uint64_t *result, const uint64_t *right) {
    uint32_t carry;
    unsigned i;
    
    /* Multiply by (2^31 + 1). */
    carry = 0;
    for (i = 0; i < num_words_secp160r1; ++i) {
        uint64_t tmp = (right[i] >> 32) | (right[i + 1] << 32);
        result[i] = (tmp << 31) + tmp + carry;
        carry = (tmp >> 33) + (result[i] < tmp || (carry && result[i] == tmp));
    }
    result[i] = carry;
}
#else
static void vli_mmod_fast_secp160r1(uECC_word_t *result, uECC_word_t *product) {
    uECC_word_t tmp[2 * num_words_secp160r1];
    uECC_word_t carry;
    
    uECC_vli_clear(tmp, num_words_secp160r1);
    uECC_vli_clear(tmp + num_words_secp160r1, num_words_secp160r1);

    omega_mult_secp160r1(tmp, product + num_words_secp160r1); /* (Rq, q) = q * c */
    
    carry = uECC_vli_add(result, product, tmp, num_words_secp160r1); /* (C, r) = r + q */
    uECC_vli_clear(product, num_words_secp160r1);
    omega_mult_secp160r1(product, tmp + num_words_secp160r1); /* Rq*c */
    carry += uECC_vli_add(result, result, product, num_words_secp160r1); /* (C1, r) = r + Rq*c */

    while (carry > 0) {
        --carry;
        uECC_vli_sub(result, result, curve_secp160r1.p, num_words_secp160r1);
    }
    if (uECC_vli_cmp_unsafe(result, curve_secp160r1.p, num_words_secp160r1) > 0) {
        uECC_vli_sub(result, result, curve_secp160r1.p, num_words_secp160r1);
    }
}
#endif

#if uECC_WORD_SIZE == 1
static void omega_mult_secp160r1(uint8_t *result, const uint8_t *right) {
    uint8_t carry;
    uint8_t i;
    
    /* Multiply by (2^31 + 1). */
    uECC_vli_set(result + 4, right, num_words_secp160r1); /* 2^32 */
    uECC_vli_rshift1(result + 4, num_words_secp160r1); /* 2^31 */
    result[3] = right[0] << 7; /* get last bit from shift */
    
    carry = uECC_vli_add(result, result, right, num_words_secp160r1); /* 2^31 + 1 */
    for (i = num_words_secp160r1; carry; ++i) {
        uint16_t sum = (uint16_t)result[i] + carry;
        result[i] = (uint8_t)sum;
        carry = sum >> 8;
    }
}
#elif uECC_WORD_SIZE == 4
static void omega_mult_secp160r1(uint32_t *result, const uint32_t *right) {
    uint32_t carry;
    unsigned i;
    
    /* Multiply by (2^31 + 1). */
    uECC_vli_set(result + 1, right, num_words_secp160r1); /* 2^32 */
    uECC_vli_rshift1(result + 1, num_words_secp160r1); /* 2^31 */
    result[0] = right[0] << 31; /* get last bit from shift */
    
    carry = uECC_vli_add(result, result, right, num_words_secp160r1); /* 2^31 + 1 */
    for (i = num_words_secp160r1; carry; ++i) {
        uint64_t sum = (uint64_t)result[i] + carry;
        result[i] = (uint32_t)sum;
        carry = sum >> 32;
    }
}
#endif /* uECC_WORD_SIZE */
#endif /* (uECC_OPTIMIZATION_LEVEL > 0 && !asm_mmod_fast_secp160r1) */

#endif /* uECC_SUPPORTS_secp160r1 */

#if uECC_SUPPORTS_secp192r1

#if (uECC_OPTIMIZATION_LEVEL > 0)
static void vli_mmod_fast_secp192r1(uECC_word_t *result, uECC_word_t *product);
#endif

static const struct uECC_Curve_t curve_secp192r1 = {
    num_words_secp192r1,
    num_bytes_secp192r1,
    192, /* num_n_bits */
    { BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FE, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(31, 28, D2, B4, B1, C9, 6B, 14),
        BYTES_TO_WORDS_8(36, F8, DE, 99, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(12, 10, FF, 82, FD, 0A, FF, F4),
        BYTES_TO_WORDS_8(00, 88, A1, 43, EB, 20, BF, 7C),
        BYTES_TO_WORDS_8(F6, 90, 30, B0, 0E, A8, 8D, 18),

        BYTES_TO_WORDS_8(11, 48, 79, 1E, A1, 77, F9, 73),
        BYTES_TO_WORDS_8(D5, CD, 24, 6B, ED, 11, 10, 63),
        BYTES_TO_WORDS_8(78, DA, C8, FF, 95, 2B, 19, 07) },
    { BYTES_TO_WORDS_8(B1, B9, 46, C1, EC, DE, B8, FE),
        BYTES_TO_WORDS_8(49, 30, 24, 72, AB, E9, A7, 0F),
        BYTES_TO_WORDS_8(E7, 80, 9C, E5, 19, 05, 21, 64) },
    &double_jacobian_default,
#if uECC_SUPPORT_COMPRESSED_POINT
    &mod_sqrt_default,
#endif
    &x_side_default,
#if (uECC_OPTIMIZATION_LEVEL > 0)
    &vli_mmod_fast_secp192r1
#endif
};

uECC_Curve uECC_secp192r1(void) { return &curve_secp192r1; }

#if (uECC_OPTIMIZATION_LEVEL > 0)
/* Computes result = product % curve_p.
   See algorithm 5 and 6 from http://www.isys.uni-klu.ac.at/PDF/2001-0126-MT.pdf */
#if uECC_WORD_SIZE == 1
static void vli_mmod_fast_secp192r1(uint8_t *result, uint8_t *product) {
    uint8_t tmp[num_words_secp192r1];
    uint8_t carry;
    
    uECC_vli_set(result, product, num_words_secp192r1);
    
    uECC_vli_set(tmp, &product[24], num_words_secp192r1);
    carry = uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = tmp[1] = tmp[2] = tmp[3] = tmp[4] = tmp[5] = tmp[6] = tmp[7] = 0;
    tmp[8] = product[24]; tmp[9] = product[25]; tmp[10] = product[26]; tmp[11] = product[27];
    tmp[12] = product[28]; tmp[13] = product[29]; tmp[14] = product[30]; tmp[15] = product[31];
    tmp[16] = product[32]; tmp[17] = product[33]; tmp[18] = product[34]; tmp[19] = product[35];
    tmp[20] = product[36]; tmp[21] = product[37]; tmp[22] = product[38]; tmp[23] = product[39];
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = tmp[8] = product[40];
    tmp[1] = tmp[9] = product[41];
    tmp[2] = tmp[10] = product[42];
    tmp[3] = tmp[11] = product[43];
    tmp[4] = tmp[12] = product[44];
    tmp[5] = tmp[13] = product[45];
    tmp[6] = tmp[14] = product[46];
    tmp[7] = tmp[15] = product[47];
    tmp[16] = tmp[17] = tmp[18] = tmp[19] = tmp[20] = tmp[21] = tmp[22] = tmp[23] = 0;
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    while (carry || uECC_vli_cmp_unsafe(curve_secp192r1.p, result, num_words_secp192r1) != 1) {
        carry -= uECC_vli_sub(result, result, curve_secp192r1.p, num_words_secp192r1);
    }
}
#elif uECC_WORD_SIZE == 4
static void vli_mmod_fast_secp192r1(uint32_t *result, uint32_t *product) {
    uint32_t tmp[num_words_secp192r1];
    int carry;
    
    uECC_vli_set(result, product, num_words_secp192r1);
    
    uECC_vli_set(tmp, &product[6], num_words_secp192r1);
    carry = uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = tmp[1] = 0;
    tmp[2] = product[6];
    tmp[3] = product[7];
    tmp[4] = product[8];
    tmp[5] = product[9];
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = tmp[2] = product[10];
    tmp[1] = tmp[3] = product[11];
    tmp[4] = tmp[5] = 0;
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    while (carry || uECC_vli_cmp_unsafe(curve_secp192r1.p, result, num_words_secp192r1) != 1) {
        carry -= uECC_vli_sub(result, result, curve_secp192r1.p, num_words_secp192r1);
    }
}
#else
static void vli_mmod_fast_secp192r1(uint64_t *result, uint64_t *product) {
    uint64_t tmp[num_words_secp192r1];
    int carry;
    
    uECC_vli_set(result, product, num_words_secp192r1);
    
    uECC_vli_set(tmp, &product[3], num_words_secp192r1);
    carry = (int)uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = 0;
    tmp[1] = product[3];
    tmp[2] = product[4];
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    tmp[0] = tmp[1] = product[5];
    tmp[2] = 0;
    carry += uECC_vli_add(result, result, tmp, num_words_secp192r1);
    
    while (carry || uECC_vli_cmp_unsafe(curve_secp192r1.p, result, num_words_secp192r1) != 1) {
        carry -= uECC_vli_sub(result, result, curve_secp192r1.p, num_words_secp192r1);
    }
}
#endif /* uECC_WORD_SIZE */
#endif /* (uECC_OPTIMIZATION_LEVEL > 0) */

#endif /* uECC_SUPPORTS_secp192r1 */

#if uECC_SUPPORTS_secp224r1

#if uECC_SUPPORT_COMPRESSED_POINT
static void mod_sqrt_secp224r1(uECC_word_t *a, uECC_Curve curve);
#endif
#if (uECC_OPTIMIZATION_LEVEL > 0)
static void vli_mmod_fast_secp224r1(uECC_word_t *result, uECC_word_t *product);
#endif

static const struct uECC_Curve_t curve_secp224r1 = {
    num_words_secp224r1,
    num_bytes_secp224r1,
    224, /* num_n_bits */
    { BYTES_TO_WORDS_8(01, 00, 00, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_4(FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(3D, 2A, 5C, 5C, 45, 29, DD, 13),
        BYTES_TO_WORDS_8(3E, F0, B8, E0, A2, 16, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_4(FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(21, 1D, 5C, 11, D6, 80, 32, 34),
        BYTES_TO_WORDS_8(22, 11, C2, 56, D3, C1, 03, 4A),
        BYTES_TO_WORDS_8(B9, 90, 13, 32, 7F, BF, B4, 6B),
        BYTES_TO_WORDS_4(BD, 0C, 0E, B7),

        BYTES_TO_WORDS_8(34, 7E, 00, 85, 99, 81, D5, 44),
        BYTES_TO_WORDS_8(64, 47, 07, 5A, A0, 75, 43, CD),
        BYTES_TO_WORDS_8(E6, DF, 22, 4C, FB, 23, F7, B5),
        BYTES_TO_WORDS_4(88, 63, 37, BD) },
    { BYTES_TO_WORDS_8(B4, FF, 55, 23, 43, 39, 0B, 27),
        BYTES_TO_WORDS_8(BA, D8, BF, D7, B7, B0, 44, 50),
        BYTES_TO_WORDS_8(56, 32, 41, F5, AB, B3, 04, 0C),
        BYTES_TO_WORDS_4(85, 0A, 05, B4) },
    &double_jacobian_default,
#if uECC_SUPPORT_COMPRESSED_POINT
    &mod_sqrt_secp224r1,
#endif
    &x_side_default,
#if (uECC_OPTIMIZATION_LEVEL > 0)
    &vli_mmod_fast_secp224r1
#endif
};

uECC_Curve uECC_secp224r1(void) { return &curve_secp224r1; }


#if uECC_SUPPORT_COMPRESSED_POINT
/* Routine 3.2.4 RS;  from http://www.nsa.gov/ia/_files/nist-routines.pdf */
static void mod_sqrt_secp224r1_rs(uECC_word_t *d1,
                                  uECC_word_t *e1,
                                  uECC_word_t *f1,
                                  const uECC_word_t *d0,
                                  const uECC_word_t *e0,
                                  const uECC_word_t *f0) {
    uECC_word_t t[num_words_secp224r1];

    uECC_vli_modSquare_fast(t, d0, &curve_secp224r1);                    /* t <-- d0 ^ 2 */
    uECC_vli_modMult_fast(e1, d0, e0, &curve_secp224r1);                 /* e1 <-- d0 * e0 */
    uECC_vli_modAdd(d1, t, f0, curve_secp224r1.p, num_words_secp224r1);  /* d1 <-- t  + f0 */
    uECC_vli_modAdd(e1, e1, e1, curve_secp224r1.p, num_words_secp224r1); /* e1 <-- e1 + e1 */
    uECC_vli_modMult_fast(f1, t, f0, &curve_secp224r1);                  /* f1 <-- t  * f0 */
    uECC_vli_modAdd(f1, f1, f1, curve_secp224r1.p, num_words_secp224r1); /* f1 <-- f1 + f1 */
    uECC_vli_modAdd(f1, f1, f1, curve_secp224r1.p, num_words_secp224r1); /* f1 <-- f1 + f1 */
}

/* Routine 3.2.5 RSS;  from http://www.nsa.gov/ia/_files/nist-routines.pdf */
static void mod_sqrt_secp224r1_rss(uECC_word_t *d1,
                                   uECC_word_t *e1,
                                   uECC_word_t *f1,
                                   const uECC_word_t *d0,
                                   const uECC_word_t *e0,
                                   const uECC_word_t *f0,
                                   const bitcount_t j) {
    bitcount_t i;

    uECC_vli_set(d1, d0, num_words_secp224r1); /* d1 <-- d0 */
    uECC_vli_set(e1, e0, num_words_secp224r1); /* e1 <-- e0 */
    uECC_vli_set(f1, f0, num_words_secp224r1); /* f1 <-- f0 */
    for (i = 1; i <= j; i++) {
        mod_sqrt_secp224r1_rs(d1, e1, f1, d1, e1, f1); /* RS (d1,e1,f1,d1,e1,f1) */
    }
}

/* Routine 3.2.6 RM;  from http://www.nsa.gov/ia/_files/nist-routines.pdf */
static void mod_sqrt_secp224r1_rm(uECC_word_t *d2,
                                  uECC_word_t *e2,
                                  uECC_word_t *f2,
                                  const uECC_word_t *c,
                                  const uECC_word_t *d0,
                                  const uECC_word_t *e0,
                                  const uECC_word_t *d1,
                                  const uECC_word_t *e1) {
    uECC_word_t t1[num_words_secp224r1];
    uECC_word_t t2[num_words_secp224r1];

    uECC_vli_modMult_fast(t1, e0, e1, &curve_secp224r1); /* t1 <-- e0 * e1 */
    uECC_vli_modMult_fast(t1, t1, c, &curve_secp224r1);  /* t1 <-- t1 * c */
    /* t1 <-- p  - t1 */
    uECC_vli_modSub(t1, curve_secp224r1.p, t1, curve_secp224r1.p, num_words_secp224r1);
    uECC_vli_modMult_fast(t2, d0, d1, &curve_secp224r1);                 /* t2 <-- d0 * d1 */
    uECC_vli_modAdd(t2, t2, t1, curve_secp224r1.p, num_words_secp224r1); /* t2 <-- t2 + t1 */
    uECC_vli_modMult_fast(t1, d0, e1, &curve_secp224r1);                 /* t1 <-- d0 * e1 */
    uECC_vli_modMult_fast(e2, d1, e0, &curve_secp224r1);                 /* e2 <-- d1 * e0 */
    uECC_vli_modAdd(e2, e2, t1, curve_secp224r1.p, num_words_secp224r1); /* e2 <-- e2 + t1 */
    uECC_vli_modSquare_fast(f2, e2, &curve_secp224r1);                   /* f2 <-- e2^2 */
    uECC_vli_modMult_fast(f2, f2, c, &curve_secp224r1);                  /* f2 <-- f2 * c */
    /* f2 <-- p  - f2 */
    uECC_vli_modSub(f2, curve_secp224r1.p, f2, curve_secp224r1.p, num_words_secp224r1);
    uECC_vli_set(d2, t2, num_words_secp224r1); /* d2 <-- t2 */
}

/* Routine 3.2.7 RP;  from http://www.nsa.gov/ia/_files/nist-routines.pdf */
static void mod_sqrt_secp224r1_rp(uECC_word_t *d1,
                                  uECC_word_t *e1,
                                  uECC_word_t *f1,
                                  const uECC_word_t *c,
                                  const uECC_word_t *r) {
    wordcount_t i;
    wordcount_t pow2i = 1;
    uECC_word_t d0[num_words_secp224r1];
    uECC_word_t e0[num_words_secp224r1] = {1}; /* e0 <-- 1 */
    uECC_word_t f0[num_words_secp224r1];

    uECC_vli_set(d0, r, num_words_secp224r1); /* d0 <-- r */
    /* f0 <-- p  - c */
    uECC_vli_modSub(f0, curve_secp224r1.p, c, curve_secp224r1.p, num_words_secp224r1);
    for (i = 0; i <= 6; i++) {
        mod_sqrt_secp224r1_rss(d1, e1, f1, d0, e0, f0, pow2i); /* RSS (d1,e1,f1,d0,e0,f0,2^i) */
        mod_sqrt_secp224r1_rm(d1, e1, f1, c, d1, e1, d0, e0);  /* RM (d1,e1,f1,c,d1,e1,d0,e0) */
        uECC_vli_set(d0, d1, num_words_secp224r1); /* d0 <-- d1 */
        uECC_vli_set(e0, e1, num_words_secp224r1); /* e0 <-- e1 */
        uECC_vli_set(f0, f1, num_words_secp224r1); /* f0 <-- f1 */
        pow2i *= 2;
    }
}

/* Compute a = sqrt(a) (mod curve_p). */
/* Routine 3.2.8 mp_mod_sqrt_224; from http://www.nsa.gov/ia/_files/nist-routines.pdf */
static void mod_sqrt_secp224r1(uECC_word_t *a, uECC_Curve curve) {
    bitcount_t i;
    uECC_word_t e1[num_words_secp224r1];
    uECC_word_t f1[num_words_secp224r1];
    uECC_word_t d0[num_words_secp224r1];
    uECC_word_t e0[num_words_secp224r1];
    uECC_word_t f0[num_words_secp224r1];
    uECC_word_t d1[num_words_secp224r1];

    /* s = a; using constant instead of random value */
    mod_sqrt_secp224r1_rp(d0, e0, f0, a, a);           /* RP (d0, e0, f0, c, s) */
    mod_sqrt_secp224r1_rs(d1, e1, f1, d0, e0, f0);     /* RS (d1, e1, f1, d0, e0, f0) */
    for (i = 1; i <= 95; i++) {
        uECC_vli_set(d0, d1, num_words_secp224r1);          /* d0 <-- d1 */
        uECC_vli_set(e0, e1, num_words_secp224r1);          /* e0 <-- e1 */
        uECC_vli_set(f0, f1, num_words_secp224r1);          /* f0 <-- f1 */
        mod_sqrt_secp224r1_rs(d1, e1, f1, d0, e0, f0); /* RS (d1, e1, f1, d0, e0, f0) */
        if (uECC_vli_isZero(d1, num_words_secp224r1)) {     /* if d1 == 0 */
                break;
        }
    }
    uECC_vli_modInv(f1, e0, curve_secp224r1.p, num_words_secp224r1); /* f1 <-- 1 / e0 */
    uECC_vli_modMult_fast(a, d0, f1, &curve_secp224r1);              /* a  <-- d0 / e0 */
}
#endif /* uECC_SUPPORT_COMPRESSED_POINT */

#if (uECC_OPTIMIZATION_LEVEL > 0)
/* Computes result = product % curve_p
   from http://www.nsa.gov/ia/_files/nist-routines.pdf */
#if uECC_WORD_SIZE == 1
static void vli_mmod_fast_secp224r1(uint8_t *result, uint8_t *product) {
    uint8_t tmp[num_words_secp224r1];
    int8_t carry;

    /* t */
    uECC_vli_set(result, product, num_words_secp224r1);

    /* s1 */
    tmp[0] = tmp[1] = tmp[2] = tmp[3] = 0;
    tmp[4] = tmp[5] = tmp[6] = tmp[7] = 0;
    tmp[8] = tmp[9] = tmp[10] = tmp[11] = 0;
    tmp[12] = product[28]; tmp[13] = product[29]; tmp[14] = product[30]; tmp[15] = product[31];
    tmp[16] = product[32]; tmp[17] = product[33]; tmp[18] = product[34]; tmp[19] = product[35];
    tmp[20] = product[36]; tmp[21] = product[37]; tmp[22] = product[38]; tmp[23] = product[39];
    tmp[24] = product[40]; tmp[25] = product[41]; tmp[26] = product[42]; tmp[27] = product[43];
    carry = uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* s2 */
    tmp[12] = product[44]; tmp[13] = product[45]; tmp[14] = product[46]; tmp[15] = product[47];
    tmp[16] = product[48]; tmp[17] = product[49]; tmp[18] = product[50]; tmp[19] = product[51];
    tmp[20] = product[52]; tmp[21] = product[53]; tmp[22] = product[54]; tmp[23] = product[55];
    tmp[24] = tmp[25] = tmp[26] = tmp[27] = 0;
    carry += uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* d1 */
    tmp[0]  = product[28]; tmp[1]  = product[29]; tmp[2]  = product[30]; tmp[3]  = product[31];
    tmp[4]  = product[32]; tmp[5]  = product[33]; tmp[6]  = product[34]; tmp[7]  = product[35];
    tmp[8]  = product[36]; tmp[9]  = product[37]; tmp[10] = product[38]; tmp[11] = product[39];
    tmp[12] = product[40]; tmp[13] = product[41]; tmp[14] = product[42]; tmp[15] = product[43];
    tmp[16] = product[44]; tmp[17] = product[45]; tmp[18] = product[46]; tmp[19] = product[47];
    tmp[20] = product[48]; tmp[21] = product[49]; tmp[22] = product[50]; tmp[23] = product[51];
    tmp[24] = product[52]; tmp[25] = product[53]; tmp[26] = product[54]; tmp[27] = product[55];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    /* d2 */
    tmp[0]  = product[44]; tmp[1]  = product[45]; tmp[2]  = product[46]; tmp[3]  = product[47];
    tmp[4]  = product[48]; tmp[5]  = product[49]; tmp[6]  = product[50]; tmp[7]  = product[51];
    tmp[8]  = product[52]; tmp[9]  = product[53]; tmp[10] = product[54]; tmp[11] = product[55];
    tmp[12] = tmp[13] = tmp[14] = tmp[15] = 0;
    tmp[16] = tmp[17] = tmp[18] = tmp[19] = 0;
    tmp[20] = tmp[21] = tmp[22] = tmp[23] = 0;
    tmp[24] = tmp[25] = tmp[26] = tmp[27] = 0;
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp224r1.p, num_words_secp224r1);
        } while (carry < 0);
    } else {
        while (carry || uECC_vli_cmp_unsafe(curve_secp224r1.p, result, num_words_secp224r1) != 1) {
            carry -= uECC_vli_sub(result, result, curve_secp224r1.p, num_words_secp224r1);
        }
    }
}
#elif uECC_WORD_SIZE == 4
static void vli_mmod_fast_secp224r1(uint32_t *result, uint32_t *product)
{
    uint32_t tmp[num_words_secp224r1];
    int carry;

    /* t */
    uECC_vli_set(result, product, num_words_secp224r1);

    /* s1 */
    tmp[0] = tmp[1] = tmp[2] = 0;
    tmp[3] = product[7];
    tmp[4] = product[8];
    tmp[5] = product[9];
    tmp[6] = product[10];
    carry = uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* s2 */
    tmp[3] = product[11];
    tmp[4] = product[12];
    tmp[5] = product[13];
    tmp[6] = 0;
    carry += uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* d1 */
    tmp[0] = product[7];
    tmp[1] = product[8];
    tmp[2] = product[9];
    tmp[3] = product[10];
    tmp[4] = product[11];
    tmp[5] = product[12];
    tmp[6] = product[13];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    /* d2 */
    tmp[0] = product[11];
    tmp[1] = product[12];
    tmp[2] = product[13];
    tmp[3] = tmp[4] = tmp[5] = tmp[6] = 0;
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp224r1.p, num_words_secp224r1);
        } while (carry < 0);
    } else {
        while (carry || uECC_vli_cmp_unsafe(curve_secp224r1.p, result, num_words_secp224r1) != 1) {
            carry -= uECC_vli_sub(result, result, curve_secp224r1.p, num_words_secp224r1);
        }
    }
}
#else
static void vli_mmod_fast_secp224r1(uint64_t *result, uint64_t *product)
{
    uint64_t tmp[num_words_secp224r1];
    int carry = 0;

    /* t */
    uECC_vli_set(result, product, num_words_secp224r1);
    result[num_words_secp224r1 - 1] &= 0xffffffff;

    /* s1 */
    tmp[0] = 0;
    tmp[1] = product[3] & 0xffffffff00000000ull;
    tmp[2] = product[4];
    tmp[3] = product[5] & 0xffffffff;
    uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* s2 */
    tmp[1] = product[5] & 0xffffffff00000000ull;
    tmp[2] = product[6];
    tmp[3] = 0;
    uECC_vli_add(result, result, tmp, num_words_secp224r1);

    /* d1 */
    tmp[0] = (product[3] >> 32) | (product[4] << 32);
    tmp[1] = (product[4] >> 32) | (product[5] << 32);
    tmp[2] = (product[5] >> 32) | (product[6] << 32);
    tmp[3] = product[6] >> 32;
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    /* d2 */
    tmp[0] = (product[5] >> 32) | (product[6] << 32);
    tmp[1] = product[6] >> 32;
    tmp[2] = tmp[3] = 0;
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp224r1);

    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp224r1.p, num_words_secp224r1);
        } while (carry < 0);
    } else {
        while (uECC_vli_cmp_unsafe(curve_secp224r1.p, result, num_words_secp224r1) != 1) {
            uECC_vli_sub(result, result, curve_secp224r1.p, num_words_secp224r1);
        }
    }
}
#endif /* uECC_WORD_SIZE */
#endif /* (uECC_OPTIMIZATION_LEVEL > 0) */

#endif /* uECC_SUPPORTS_secp224r1 */

#if uECC_SUPPORTS_secp256r1

#if (uECC_OPTIMIZATION_LEVEL > 0)
static void vli_mmod_fast_secp256r1(uECC_word_t *result, uECC_word_t *product);
#endif

static const struct uECC_Curve_t curve_secp256r1 = {
    num_words_secp256r1,
    num_bytes_secp256r1,
    256, /* num_n_bits */
    { BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(01, 00, 00, 00, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(51, 25, 63, FC, C2, CA, B9, F3),
        BYTES_TO_WORDS_8(84, 9E, 17, A7, AD, FA, E6, BC),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(00, 00, 00, 00, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(96, C2, 98, D8, 45, 39, A1, F4),
        BYTES_TO_WORDS_8(A0, 33, EB, 2D, 81, 7D, 03, 77),
        BYTES_TO_WORDS_8(F2, 40, A4, 63, E5, E6, BC, F8),
        BYTES_TO_WORDS_8(47, 42, 2C, E1, F2, D1, 17, 6B),

        BYTES_TO_WORDS_8(F5, 51, BF, 37, 68, 40, B6, CB),
        BYTES_TO_WORDS_8(CE, 5E, 31, 6B, 57, 33, CE, 2B),
        BYTES_TO_WORDS_8(16, 9E, 0F, 7C, 4A, EB, E7, 8E),
        BYTES_TO_WORDS_8(9B, 7F, 1A, FE, E2, 42, E3, 4F) },
    { BYTES_TO_WORDS_8(4B, 60, D2, 27, 3E, 3C, CE, 3B),
        BYTES_TO_WORDS_8(F6, B0, 53, CC, B0, 06, 1D, 65),
        BYTES_TO_WORDS_8(BC, 86, 98, 76, 55, BD, EB, B3),
        BYTES_TO_WORDS_8(E7, 93, 3A, AA, D8, 35, C6, 5A) },
    &double_jacobian_default,
#if uECC_SUPPORT_COMPRESSED_POINT
    &mod_sqrt_default,
#endif
    &x_side_default,
#if (uECC_OPTIMIZATION_LEVEL > 0)
    &vli_mmod_fast_secp256r1
#endif
};

uECC_Curve uECC_secp256r1(void) { return &curve_secp256r1; }


#if (uECC_OPTIMIZATION_LEVEL > 0 && !asm_mmod_fast_secp256r1)
/* Computes result = product % curve_p
   from http://www.nsa.gov/ia/_files/nist-routines.pdf */
#if uECC_WORD_SIZE == 1
static void vli_mmod_fast_secp256r1(uint8_t *result, uint8_t *product) {
    uint8_t tmp[num_words_secp256r1];
    int8_t carry;
    
    /* t */
    uECC_vli_set(result, product, num_words_secp256r1);
    
    /* s1 */
    tmp[0] = tmp[1] = tmp[2] = tmp[3] = 0;
    tmp[4] = tmp[5] = tmp[6] = tmp[7] = 0;
    tmp[8] = tmp[9] = tmp[10] = tmp[11] = 0;
    tmp[12] = product[44]; tmp[13] = product[45]; tmp[14] = product[46]; tmp[15] = product[47];
    tmp[16] = product[48]; tmp[17] = product[49]; tmp[18] = product[50]; tmp[19] = product[51];
    tmp[20] = product[52]; tmp[21] = product[53]; tmp[22] = product[54]; tmp[23] = product[55];
    tmp[24] = product[56]; tmp[25] = product[57]; tmp[26] = product[58]; tmp[27] = product[59];
    tmp[28] = product[60]; tmp[29] = product[61]; tmp[30] = product[62]; tmp[31] = product[63];
    carry = uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s2 */
    tmp[12] = product[48]; tmp[13] = product[49]; tmp[14] = product[50]; tmp[15] = product[51];
    tmp[16] = product[52]; tmp[17] = product[53]; tmp[18] = product[54]; tmp[19] = product[55];
    tmp[20] = product[56]; tmp[21] = product[57]; tmp[22] = product[58]; tmp[23] = product[59];
    tmp[24] = product[60]; tmp[25] = product[61]; tmp[26] = product[62]; tmp[27] = product[63];
    tmp[28] = tmp[29] = tmp[30] = tmp[31] = 0;
    carry += uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s3 */
    tmp[0] = product[32]; tmp[1] = product[33]; tmp[2] = product[34]; tmp[3] = product[35];
    tmp[4] = product[36]; tmp[5] = product[37]; tmp[6] = product[38]; tmp[7] = product[39];
    tmp[8] = product[40]; tmp[9] = product[41]; tmp[10] = product[42]; tmp[11] = product[43];
    tmp[12] = tmp[13] = tmp[14] = tmp[15] = 0;
    tmp[16] = tmp[17] = tmp[18] = tmp[19] = 0;
    tmp[20] = tmp[21] = tmp[22] = tmp[23] = 0;
    tmp[24] = product[56]; tmp[25] = product[57]; tmp[26] = product[58]; tmp[27] = product[59];
    tmp[28] = product[60]; tmp[29] = product[61]; tmp[30] = product[62]; tmp[31] = product[63];
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s4 */
    tmp[0] = product[36]; tmp[1] = product[37]; tmp[2] = product[38]; tmp[3] = product[39];
    tmp[4] = product[40]; tmp[5] = product[41]; tmp[6] = product[42]; tmp[7] = product[43];
    tmp[8] = product[44]; tmp[9] = product[45]; tmp[10] = product[46]; tmp[11] = product[47];
    tmp[12] = product[52]; tmp[13] = product[53]; tmp[14] = product[54]; tmp[15] = product[55];
    tmp[16] = product[56]; tmp[17] = product[57]; tmp[18] = product[58]; tmp[19] = product[59];
    tmp[20] = product[60]; tmp[21] = product[61]; tmp[22] = product[62]; tmp[23] = product[63];
    tmp[24] = product[52]; tmp[25] = product[53]; tmp[26] = product[54]; tmp[27] = product[55];
    tmp[28] = product[32]; tmp[29] = product[33]; tmp[30] = product[34]; tmp[31] = product[35];
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* d1 */
    tmp[0] = product[44]; tmp[1] = product[45]; tmp[2] = product[46]; tmp[3] = product[47];
    tmp[4] = product[48]; tmp[5] = product[49]; tmp[6] = product[50]; tmp[7] = product[51];
    tmp[8] = product[52]; tmp[9] = product[53]; tmp[10] = product[54]; tmp[11] = product[55];
    tmp[12] = tmp[13] = tmp[14] = tmp[15] = 0;
    tmp[16] = tmp[17] = tmp[18] = tmp[19] = 0;
    tmp[20] = tmp[21] = tmp[22] = tmp[23] = 0;
    tmp[24] = product[32]; tmp[25] = product[33]; tmp[26] = product[34]; tmp[27] = product[35];
    tmp[28] = product[40]; tmp[29] = product[41]; tmp[30] = product[42]; tmp[31] = product[43];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d2 */
    tmp[0] = product[48]; tmp[1] = product[49]; tmp[2] = product[50]; tmp[3] = product[51];
    tmp[4] = product[52]; tmp[5] = product[53]; tmp[6] = product[54]; tmp[7] = product[55];
    tmp[8] = product[56]; tmp[9] = product[57]; tmp[10] = product[58]; tmp[11] = product[59];
    tmp[12] = product[60]; tmp[13] = product[61]; tmp[14] = product[62]; tmp[15] = product[63];
    tmp[16] = tmp[17] = tmp[18] = tmp[19] = 0;
    tmp[20] = tmp[21] = tmp[22] = tmp[23] = 0;
    tmp[24] = product[36]; tmp[25] = product[37]; tmp[26] = product[38]; tmp[27] = product[39];
    tmp[28] = product[44]; tmp[29] = product[45]; tmp[30] = product[46]; tmp[31] = product[47];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d3 */
    tmp[0] = product[52]; tmp[1] = product[53]; tmp[2] = product[54]; tmp[3] = product[55];
    tmp[4] = product[56]; tmp[5] = product[57]; tmp[6] = product[58]; tmp[7] = product[59];
    tmp[8] = product[60]; tmp[9] = product[61]; tmp[10] = product[62]; tmp[11] = product[63];
    tmp[12] = product[32]; tmp[13] = product[33]; tmp[14] = product[34]; tmp[15] = product[35];
    tmp[16] = product[36]; tmp[17] = product[37]; tmp[18] = product[38]; tmp[19] = product[39];
    tmp[20] = product[40]; tmp[21] = product[41]; tmp[22] = product[42]; tmp[23] = product[43];
    tmp[24] = tmp[25] = tmp[26] = tmp[27] = 0;
    tmp[28] = product[48]; tmp[29] = product[49]; tmp[30] = product[50]; tmp[31] = product[51];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d4 */
    tmp[0] = product[56]; tmp[1] = product[57]; tmp[2] = product[58]; tmp[3] = product[59];
    tmp[4] = product[60]; tmp[5] = product[61]; tmp[6] = product[62]; tmp[7] = product[63];
    tmp[8] = tmp[9] = tmp[10] = tmp[11] = 0;
    tmp[12] = product[36]; tmp[13] = product[37]; tmp[14] = product[38]; tmp[15] = product[39];
    tmp[16] = product[40]; tmp[17] = product[41]; tmp[18] = product[42]; tmp[19] = product[43];
    tmp[20] = product[44]; tmp[21] = product[45]; tmp[22] = product[46]; tmp[23] = product[47];
    tmp[24] = tmp[25] = tmp[26] = tmp[27] = 0;
    tmp[28] = product[52]; tmp[29] = product[53]; tmp[30] = product[54]; tmp[31] = product[55];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp256r1.p, num_words_secp256r1);
        } while (carry < 0);
    } else {
        while (carry || uECC_vli_cmp_unsafe(curve_secp256r1.p, result, num_words_secp256r1) != 1) {
            carry -= uECC_vli_sub(result, result, curve_secp256r1.p, num_words_secp256r1);
        }
    }
}
#elif uECC_WORD_SIZE == 4
static void vli_mmod_fast_secp256r1(uint32_t *result, uint32_t *product) {
    uint32_t tmp[num_words_secp256r1];
    int carry;
    
    /* t */
    uECC_vli_set(result, product, num_words_secp256r1);
    
    /* s1 */
    tmp[0] = tmp[1] = tmp[2] = 0;
    tmp[3] = product[11];
    tmp[4] = product[12];
    tmp[5] = product[13];
    tmp[6] = product[14];
    tmp[7] = product[15];
    carry = uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s2 */
    tmp[3] = product[12];
    tmp[4] = product[13];
    tmp[5] = product[14];
    tmp[6] = product[15];
    tmp[7] = 0;
    carry += uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s3 */
    tmp[0] = product[8];
    tmp[1] = product[9];
    tmp[2] = product[10];
    tmp[3] = tmp[4] = tmp[5] = 0;
    tmp[6] = product[14];
    tmp[7] = product[15];
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s4 */
    tmp[0] = product[9];
    tmp[1] = product[10];
    tmp[2] = product[11];
    tmp[3] = product[13];
    tmp[4] = product[14];
    tmp[5] = product[15];
    tmp[6] = product[13];
    tmp[7] = product[8];
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* d1 */
    tmp[0] = product[11];
    tmp[1] = product[12];
    tmp[2] = product[13];
    tmp[3] = tmp[4] = tmp[5] = 0;
    tmp[6] = product[8];
    tmp[7] = product[10];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d2 */
    tmp[0] = product[12];
    tmp[1] = product[13];
    tmp[2] = product[14];
    tmp[3] = product[15];
    tmp[4] = tmp[5] = 0;
    tmp[6] = product[9];
    tmp[7] = product[11];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d3 */
    tmp[0] = product[13];
    tmp[1] = product[14];
    tmp[2] = product[15];
    tmp[3] = product[8];
    tmp[4] = product[9];
    tmp[5] = product[10];
    tmp[6] = 0;
    tmp[7] = product[12];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d4 */
    tmp[0] = product[14];
    tmp[1] = product[15];
    tmp[2] = 0;
    tmp[3] = product[9];
    tmp[4] = product[10];
    tmp[5] = product[11];
    tmp[6] = 0;
    tmp[7] = product[13];
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp256r1.p, num_words_secp256r1);
        } while (carry < 0);
    } else {
        while (carry || uECC_vli_cmp_unsafe(curve_secp256r1.p, result, num_words_secp256r1) != 1) {
            carry -= uECC_vli_sub(result, result, curve_secp256r1.p, num_words_secp256r1);
        }
    }
}
#else
static void vli_mmod_fast_secp256r1(uint64_t *result, uint64_t *product) {
    uint64_t tmp[num_words_secp256r1];
    int carry;
    
    /* t */
    uECC_vli_set(result, product, num_words_secp256r1);
    
    /* s1 */
    tmp[0] = 0;
    tmp[1] = product[5] & 0xffffffff00000000ull;
    tmp[2] = product[6];
    tmp[3] = product[7];
    carry = (int)uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s2 */
    tmp[1] = product[6] << 32;
    tmp[2] = (product[6] >> 32) | (product[7] << 32);
    tmp[3] = product[7] >> 32;
    carry += uECC_vli_add(tmp, tmp, tmp, num_words_secp256r1);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s3 */
    tmp[0] = product[4];
    tmp[1] = product[5] & 0xffffffff;
    tmp[2] = 0;
    tmp[3] = product[7];
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* s4 */
    tmp[0] = (product[4] >> 32) | (product[5] << 32);
    tmp[1] = (product[5] >> 32) | (product[6] & 0xffffffff00000000ull);
    tmp[2] = product[7];
    tmp[3] = (product[6] >> 32) | (product[4] << 32);
    carry += uECC_vli_add(result, result, tmp, num_words_secp256r1);
    
    /* d1 */
    tmp[0] = (product[5] >> 32) | (product[6] << 32);
    tmp[1] = (product[6] >> 32);
    tmp[2] = 0;
    tmp[3] = (product[4] & 0xffffffff) | (product[5] << 32);
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d2 */
    tmp[0] = product[6];
    tmp[1] = product[7];
    tmp[2] = 0;
    tmp[3] = (product[4] >> 32) | (product[5] & 0xffffffff00000000ull);
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d3 */
    tmp[0] = (product[6] >> 32) | (product[7] << 32);
    tmp[1] = (product[7] >> 32) | (product[4] << 32);
    tmp[2] = (product[4] >> 32) | (product[5] << 32);
    tmp[3] = (product[6] << 32);
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    /* d4 */
    tmp[0] = product[7];
    tmp[1] = product[4] & 0xffffffff00000000ull;
    tmp[2] = product[5];
    tmp[3] = product[6] & 0xffffffff00000000ull;
    carry -= uECC_vli_sub(result, result, tmp, num_words_secp256r1);
    
    if (carry < 0) {
        do {
            carry += uECC_vli_add(result, result, curve_secp256r1.p, num_words_secp256r1);
        } while (carry < 0);
    } else {
        while (carry || uECC_vli_cmp_unsafe(curve_secp256r1.p, result, num_words_secp256r1) != 1) {
            carry -= uECC_vli_sub(result, result, curve_secp256r1.p, num_words_secp256r1);
        }
    }
}
#endif /* uECC_WORD_SIZE */
#endif /* (uECC_OPTIMIZATION_LEVEL > 0 && !asm_mmod_fast_secp256r1) */

#endif /* uECC_SUPPORTS_secp256r1 */

#if uECC_SUPPORTS_secp256k1

static void double_jacobian_secp256k1(uECC_word_t * X1,
                                      uECC_word_t * Y1,
                                      uECC_word_t * Z1,
                                      uECC_Curve curve);
static void x_side_secp256k1(uECC_word_t *result, const uECC_word_t *x, uECC_Curve curve);
#if (uECC_OPTIMIZATION_LEVEL > 0)
static void vli_mmod_fast_secp256k1(uECC_word_t *result, uECC_word_t *product);
#endif

static const struct uECC_Curve_t curve_secp256k1 = {
    num_words_secp256k1,
    num_bytes_secp256k1,
    256, /* num_n_bits */
    { BYTES_TO_WORDS_8(2F, FC, FF, FF, FE, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(41, 41, 36, D0, 8C, 5E, D2, BF),
        BYTES_TO_WORDS_8(3B, A0, 48, AF, E6, DC, AE, BA),
        BYTES_TO_WORDS_8(FE, FF, FF, FF, FF, FF, FF, FF),
        BYTES_TO_WORDS_8(FF, FF, FF, FF, FF, FF, FF, FF) },
    { BYTES_TO_WORDS_8(98, 17, F8, 16, 5B, 81, F2, 59),
        BYTES_TO_WORDS_8(D9, 28, CE, 2D, DB, FC, 9B, 02),
        BYTES_TO_WORDS_8(07, 0B, 87, CE, 95, 62, A0, 55),
        BYTES_TO_WORDS_8(AC, BB, DC, F9, 7E, 66, BE, 79),

        BYTES_TO_WORDS_8(B8, D4, 10, FB, 8F, D0, 47, 9C),
        BYTES_TO_WORDS_8(19, 54, 85, A6, 48, B4, 17, FD),
        BYTES_TO_WORDS_8(A8, 08, 11, 0E, FC, FB, A4, 5D),
        BYTES_TO_WORDS_8(65, C4, A3, 26, 77, DA, 3A, 48) },
    { BYTES_TO_WORDS_8(07, 00, 00, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, 00, 00, 00, 00),
        BYTES_TO_WORDS_8(00, 00, 00, 00, 00, 00, 00, 00) },
    &double_jacobian_secp256k1,
#if uECC_SUPPORT_COMPRESSED_POINT
    &mod_sqrt_default,
#endif
    &x_side_secp256k1,
#if (uECC_OPTIMIZATION_LEVEL > 0)
    &vli_mmod_fast_secp256k1
#endif
};

uECC_Curve uECC_secp256k1(void) { return &curve_secp256k1; }


/* Double in place */
static void double_jacobian_secp256k1(uECC_word_t * X1,
                                      uECC_word_t * Y1,
                                      uECC_word_t * Z1,
                                      uECC_Curve curve) {
    /* t1 = X, t2 = Y, t3 = Z */
    uECC_word_t t4[num_words_secp256k1];
    uECC_word_t t5[num_words_secp256k1];
    
    if (uECC_vli_isZero(Z1, num_words_secp256k1)) {
        return;
    }
    
    uECC_vli_modSquare_fast(t5, Y1, curve);   /* t5 = y1^2 */
    uECC_vli_modMult_fast(t4, X1, t5, curve); /* t4 = x1*y1^2 = A */
    uECC_vli_modSquare_fast(X1, X1, curve);   /* t1 = x1^2 */
    uECC_vli_modSquare_fast(t5, t5, curve);   /* t5 = y1^4 */
    uECC_vli_modMult_fast(Z1, Y1, Z1, curve); /* t3 = y1*z1 = z3 */
    
    uECC_vli_modAdd(Y1, X1, X1, curve->p, num_words_secp256k1); /* t2 = 2*x1^2 */
    uECC_vli_modAdd(Y1, Y1, X1, curve->p, num_words_secp256k1); /* t2 = 3*x1^2 */
    if (uECC_vli_testBit(Y1, 0)) {
        uECC_word_t carry = uECC_vli_add(Y1, Y1, curve->p, num_words_secp256k1);
        uECC_vli_rshift1(Y1, num_words_secp256k1);
        Y1[num_words_secp256k1 - 1] |= carry << (uECC_WORD_BITS - 1);
    } else {
        uECC_vli_rshift1(Y1, num_words_secp256k1);
    }
    /* t2 = 3/2*(x1^2) = B */
    
    uECC_vli_modSquare_fast(X1, Y1, curve);                     /* t1 = B^2 */
    uECC_vli_modSub(X1, X1, t4, curve->p, num_words_secp256k1); /* t1 = B^2 - A */
    uECC_vli_modSub(X1, X1, t4, curve->p, num_words_secp256k1); /* t1 = B^2 - 2A = x3 */
    
    uECC_vli_modSub(t4, t4, X1, curve->p, num_words_secp256k1); /* t4 = A - x3 */
    uECC_vli_modMult_fast(Y1, Y1, t4, curve);                   /* t2 = B * (A - x3) */
    uECC_vli_modSub(Y1, Y1, t5, curve->p, num_words_secp256k1); /* t2 = B * (A - x3) - y1^4 = y3 */
}

/* Computes result = x^3 + b. result must not overlap x. */
static void x_side_secp256k1(uECC_word_t *result, const uECC_word_t *x, uECC_Curve curve) {
    uECC_vli_modSquare_fast(result, x, curve);                                /* r = x^2 */
    uECC_vli_modMult_fast(result, result, x, curve);                          /* r = x^3 */
    uECC_vli_modAdd(result, result, curve->b, curve->p, num_words_secp256k1); /* r = x^3 + b */
}

#if (uECC_OPTIMIZATION_LEVEL > 0)
static void omega_mult_secp256k1(uECC_word_t *result, const uECC_word_t *right);
static void vli_mmod_fast_secp256k1(uECC_word_t *result, uECC_word_t *product) {
    uECC_word_t tmp[2 * num_words_secp256k1];
    uECC_word_t carry;
    
    uECC_vli_clear(tmp, num_words_secp256k1);
    uECC_vli_clear(tmp + num_words_secp256k1, num_words_secp256k1);
    
    omega_mult_secp256k1(tmp, product + num_words_secp256k1); /* (Rq, q) = q * c */
    
    carry = uECC_vli_add(result, product, tmp, num_words_secp256k1); /* (C, r) = r + q       */
    uECC_vli_clear(product, num_words_secp256k1);
    omega_mult_secp256k1(product, tmp + num_words_secp256k1); /* Rq*c */
    carry += uECC_vli_add(result, result, product, num_words_secp256k1); /* (C1, r) = r + Rq*c */
    
    while (carry > 0) {
        --carry;
        uECC_vli_sub(result, result, curve_secp256k1.p, num_words_secp256k1);
    }
    if (uECC_vli_cmp_unsafe(result, curve_secp256k1.p, num_words_secp256k1) > 0) {
        uECC_vli_sub(result, result, curve_secp256k1.p, num_words_secp256k1);
    }
}

#if uECC_WORD_SIZE == 1
static void omega_mult_secp256k1(uint8_t * result, const uint8_t * right) {
    /* Multiply by (2^32 + 2^9 + 2^8 + 2^7 + 2^6 + 2^4 + 1). */
    uECC_word_t r0 = 0;
    uECC_word_t r1 = 0;
    uECC_word_t r2 = 0;
    wordcount_t k;
    
    /* Multiply by (2^9 + 2^8 + 2^7 + 2^6 + 2^4 + 1). */
    muladd(0xD1, right[0], &r0, &r1, &r2);
    result[0] = r0;
    r0 = r1;
    r1 = r2;
    /* r2 is still 0 */
    
    for (k = 1; k < num_words_secp256k1; ++k) {
        muladd(0x03, right[k - 1], &r0, &r1, &r2);
        muladd(0xD1, right[k], &r0, &r1, &r2);
        result[k] = r0;
        r0 = r1;
        r1 = r2;
        r2 = 0;
    }
    muladd(0x03, right[num_words_secp256k1 - 1], &r0, &r1, &r2);
    result[num_words_secp256k1] = r0;
    result[num_words_secp256k1 + 1] = r1;
    /* add the 2^32 multiple */
    result[4 + num_words_secp256k1] =
        uECC_vli_add(result + 4, result + 4, right, num_words_secp256k1); 
}
#elif uECC_WORD_SIZE == 4
static void omega_mult_secp256k1(uint32_t * result, const uint32_t * right) {
    /* Multiply by (2^9 + 2^8 + 2^7 + 2^6 + 2^4 + 1). */
    uint32_t carry = 0;
    wordcount_t k;
    
    for (k = 0; k < num_words_secp256k1; ++k) {
        uint64_t p = (uint64_t)0x3D1 * right[k] + carry;
        result[k] = p;
        carry = p >> 32;
    }
    result[num_words_secp256k1] = carry;
    /* add the 2^32 multiple */
    result[1 + num_words_secp256k1] =
        uECC_vli_add(result + 1, result + 1, right, num_words_secp256k1); 
}
#else
static void omega_mult_secp256k1(uint64_t * result, const uint64_t * right) {
    uECC_word_t r0 = 0;
    uECC_word_t r1 = 0;
    uECC_word_t r2 = 0;
    wordcount_t k;
    
    /* Multiply by (2^32 + 2^9 + 2^8 + 2^7 + 2^6 + 2^4 + 1). */
    for (k = 0; k < num_words_secp256k1; ++k) {
        muladd(0x1000003D1ull, right[k], &r0, &r1, &r2);
        result[k] = r0;
        r0 = r1;
        r1 = r2;
        r2 = 0;
    }
    result[num_words_secp256k1] = r0;
}
#endif /* uECC_WORD_SIZE */
#endif /* (uECC_OPTIMIZATION_LEVEL > 0) */

#endif /* uECC_SUPPORTS_secp256k1 */

#endif /* _UECC_CURVE_SPECIFIC_H_ */


================================================
FILE: u2f/desktop_test.cpp
================================================
//to prevent arduino IDE from compiling this
#ifdef IS_DESKTOP_TEST

//test in desktop
#define _POSIX_C_SOURCE 200809L

#include <sys/time.h>
#include <inttypes.h>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

//for storing fake eprom
#include <map>
//for fake input
#include <vector>
#include <string>
#include <iostream>
#include <fstream>
#include <cctype>

#define DESKTOP_TEST

//fake eprom
#define F(X) X

typedef unsigned char byte;

//this is used for random

enum OUTPUT_FORMAT_ENUM
{
    HEX = 1
};

std::vector<std::string> fake_input;
int current_fake_input;

int hexchar2int(int c)
{
    if (c<='9')
	return c-'0';
    return 10 + (c-'A');
}

void hex2bytes(const std::string & inp, unsigned char **res, int *len)
{
    std::string tmp;
    //ignore non hex characters (e.g: space, tab)
    for (size_t i = 0; i < inp.size(); i++) {
	int c = toupper(inp[i]);
	if ((c>='0' && c<='9') || (c>='A' && c<='F')) {
	    tmp += (char)c;
	}
    }
    *len = tmp.size()/2;    
    unsigned char *tmpres = (unsigned char *)malloc(*len);
    size_t j =0;
    for (size_t i = 0; i < tmp.size(); i+=2, j++) {
	unsigned int c1 = hexchar2int(tmp[i]);
	unsigned int c2 = hexchar2int(tmp[i+1]);
	unsigned int c = (c1 << 4) + c2;
	tmpres[j] = c;
    }
    *res = tmpres;

}

int get_next_fake_input(unsigned char **res)
{
    if (current_fake_input >= fake_input.size())
	return -1;
	    
    int len;
    printf("CURRENT FAKE INPUT: %s\n", fake_input[current_fake_input].c_str());
    hex2bytes(fake_input[current_fake_input++], res, &len);
    return len;   
}

void read_file(const std::string & filename)
{
    std::ifstream file(filename);
    std::string temp;
    while(std::getline(file, temp)) {
	fake_input.push_back(temp);
    }
    current_fake_input = 0;
}


int RNG(uint8_t *dest, unsigned size)
{
    for (int i =0; i < size; i++) {
	dest[i] = rand() % 255;
    }
    return 1;
}

long system_millis()
{
#if 0    
    long            ms; // Milliseconds
    time_t          s;  // Seconds
    struct timespec spec;
    
    clock_gettime(CLOCK_REALTIME, &spec);

    s  = spec.tv_sec;
    ms = round(spec.tv_nsec / 1.0e6); // Convert nanoseconds to milliseconds

    return s*1000 + ms;
#endif
    struct timeval  tv;
    gettimeofday(&tv, NULL);
    
    long time_in_mill = 
	(tv.tv_sec) * 1000 + (tv.tv_usec) / 1000 ; // convert tv_sec & tv_usec to millisecond
    return time_in_mill;
}


int millis()
{
    
    return 0;
}

void delayMicroseconds(int micro)
{

}

class EEPROMClass {
    std::map<int, unsigned int> values;
public:
    void get(int address, unsigned int &value)
	{
	    value =  values[address];
	}
    void put(int address, int value)
	{
	    values[address] = value;
	}
};

class SerialClass {

public:
	void begin(int speed) {
} 
void print(const char *msg)
{
    printf("%s", msg);
}
    void println() {
	printf("\n");
    }
void println(const char *msg)
	{
	    printf("%s\n", msg);
	}
void println(int number)
	{
	    printf("%d\n", number);
	}
void print(int number, OUTPUT_FORMAT_ENUM e)
	{
	    printf("%02x", number);
	}
void println(int number, OUTPUT_FORMAT_ENUM e)
	{
	    printf("%02x", number);
	}
};

class RawHIDClass {
public:
void send(byte *buffer, int to)
	{
	    printf("HID SEND: ");
	    for (int i =0; i < 64; i++) {
		printf("%02x ", buffer[i]);
	    }
	    printf("\n");
	}
int recv(byte *buffer, int timeout)
	{
	    unsigned char *inp;
	    int len = get_next_fake_input(&inp);
	    if (len==-1) {
		printf("END OF INPUT\n");
		exit(0);
	    }
	    printf("HID READ: ");
	    for (int i =0; i < len; i++) {
		printf("%02x ", inp[i]);
	    }
	    printf("\n");
	    memcpy(buffer, inp, len);
	    free(inp);	    
	    return len;
	}
};

SerialClass Serial;
RawHIDClass RawHID;
EEPROMClass EEPROM;


#include "u2f.ino"

int main(int argc, char *argv[])
{
    if (argc<2) {
	printf("usage desktop_test <INPUT>\n");
	return 0;
    }
    read_file(argv[1]);
    setup();
    while (1) {
	loop();
    }
}

#endif


================================================
FILE: u2f/platform-specific.h
================================================
/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */

#ifndef _UECC_PLATFORM_SPECIFIC_H_
#define _UECC_PLATFORM_SPECIFIC_H_

#include "types.h"

#if (defined(_WIN32) || defined(_WIN64))
/* Windows */

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <wincrypt.h>

static int default_RNG(uint8_t *dest, unsigned size) {
    HCRYPTPROV prov;
    if (!CryptAcquireContext(&prov, NULL, NULL, PROV_RSA_FULL, CRYPT_VERIFYCONTEXT)) {
        return 0;
    }

    CryptGenRandom(prov, size, (BYTE *)dest);
    CryptReleaseContext(prov, 0);
    return 1;
}
#define default_RNG_defined 1

#elif defined(unix) || defined(__linux__) || defined(__unix__) || defined(__unix) || \
    (defined(__APPLE__) && defined(__MACH__)) || defined(uECC_POSIX)

/* Some POSIX-like system with /dev/urandom or /dev/random. */
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>

#ifndef O_CLOEXEC
    #define O_CLOEXEC 0
#endif

static int default_RNG(uint8_t *dest, unsigned size) {
    int fd = open("/dev/urandom", O_RDONLY | O_CLOEXEC);
    if (fd == -1) {
        fd = open("/dev/random", O_RDONLY | O_CLOEXEC);
        if (fd == -1) {
            return 0;
        }
    }
    
    char *ptr = (char *)dest;
    size_t left = size;
    while (left > 0) {
        ssize_t bytes_read = read(fd, ptr, left);
        if (bytes_read <= 0) { // read failed
            close(fd);
            return 0;
        }
        left -= bytes_read;
        ptr += bytes_read;
    }
    
    close(fd);
    return 1;
}
#define default_RNG_defined 1

#endif /* platform */

#endif /* _UECC_PLATFORM_SPECIFIC_H_ */


================================================
FILE: u2f/sha256.c
================================================
/*********************************************************************
* Filename:   sha256.c
* Author:     Brad Conte (brad AT bradconte.com)
* Copyright:
* Disclaimer: This code is presented "as is" without any guarantees.
* Details:    Implementation of the SHA-256 hashing algorithm.
              SHA-256 is one of the three algorithms in the SHA2
              specification. The others, SHA-384 and SHA-512, are not
              offered in this implementation.
              Algorithm specification can be found here:
               * http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf
              This implementation uses little endian byte order.
*********************************************************************/

/*************************** HEADER FILES ***************************/
#include <stdlib.h>
#include <string.h>
//#include <memory.h>
#include "sha256.h"

#ifdef __cplusplus
extern "C"
{
#endif


/****************************** MACROS ******************************/
#define ROTLEFT(a,b) (((a) << (b)) | ((a) >> (32-(b))))
#define ROTRIGHT(a,b) (((a) >> (b)) | ((a) << (32-(b))))

#define CH(x,y,z) (((x) & (y)) ^ (~(x) & (z)))
#define MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))
#define EP0(x) (ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22))
#define EP1(x) (ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25))
#define SIG0(x) (ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ ((x) >> 3))
#define SIG1(x) (ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ ((x) >> 10))

/**************************** VARIABLES *****************************/
static const WORD k[64] = {
	0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5,0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5,
	0xd807aa98,0x12835b01,0x243185be,0x550c7dc3,0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174,
	0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc,0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da,
	0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7,0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967,
	0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13,0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85,
	0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3,0xd192e819,0xd6990624,0xf40e3585,0x106aa070,
	0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5,0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3,
	0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208,0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
};

/*********************** FUNCTION DEFINITIONS ***********************/
void sha256_transform(SHA256_CTX *ctx, const BYTE data[])
{
	WORD a, b, c, d, e, f, g, h, i, j, t1, t2, m[64];

	for (i = 0, j = 0; i < 16; ++i, j += 4)
		m[i] = (data[j] << 24) | (data[j + 1] << 16) | (data[j + 2] << 8) | (data[j + 3]);
	for ( ; i < 64; ++i)
		m[i] = SIG1(m[i - 2]) + m[i - 7] + SIG0(m[i - 15]) + m[i - 16];

	a = ctx->state[0];
	b = ctx->state[1];
	c = ctx->state[2];
	d = ctx->state[3];
	e = ctx->state[4];
	f = ctx->state[5];
	g = ctx->state[6];
	h = ctx->state[7];

	for (i = 0; i < 64; ++i) {
		t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
		t2 = EP0(a) + MAJ(a,b,c);
		h = g;
		g = f;
		f = e;
		e = d + t1;
		d = c;
		c = b;
		b = a;
		a = t1 + t2;
	}

	ctx->state[0] += a;
	ctx->state[1] += b;
	ctx->state[2] += c;
	ctx->state[3] += d;
	ctx->state[4] += e;
	ctx->state[5] += f;
	ctx->state[6] += g;
	ctx->state[7] += h;
}

void sha256_init(SHA256_CTX *ctx)
{
	ctx->datalen = 0;
	ctx->bitlen = 0;
	ctx->state[0] = 0x6a09e667;
	ctx->state[1] = 0xbb67ae85;
	ctx->state[2] = 0x3c6ef372;
	ctx->state[3] = 0xa54ff53a;
	ctx->state[4] = 0x510e527f;
	ctx->state[5] = 0x9b05688c;
	ctx->state[6] = 0x1f83d9ab;
	ctx->state[7] = 0x5be0cd19;
}

void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len)
{
	WORD i;

	for (i = 0; i < len; ++i) {
		ctx->data[ctx->datalen] = data[i];
		ctx->datalen++;
		if (ctx->datalen == 64) {
			sha256_transform(ctx, ctx->data);
			ctx->bitlen += 512;
			ctx->datalen = 0;
		}
	}
}

void sha256_final(SHA256_CTX *ctx, BYTE hash[])
{
	WORD i;

	i = ctx->datalen;

	// Pad whatever data is left in the buffer.
	if (ctx->datalen < 56) {
		ctx->data[i++] = 0x80;
		while (i < 56)
			ctx->data[i++] = 0x00;
	}
	else {
		ctx->data[i++] = 0x80;
		while (i < 64)
			ctx->data[i++] = 0x00;
		sha256_transform(ctx, ctx->data);
		memset(ctx->data, 0, 56);
	}

	// Append to the padding the total message's length in bits and transform.
	ctx->bitlen += ctx->datalen * 8;
	ctx->data[63] = ctx->bitlen;
	ctx->data[62] = ctx->bitlen >> 8;
	ctx->data[61] = ctx->bitlen >> 16;
	ctx->data[60] = ctx->bitlen >> 24;
	ctx->data[59] = ctx->bitlen >> 32;
	ctx->data[58] = ctx->bitlen >> 40;
	ctx->data[57] = ctx->bitlen >> 48;
	ctx->data[56] = ctx->bitlen >> 56;
	sha256_transform(ctx, ctx->data);

	// Since this implementation uses little endian byte ordering and SHA uses big endian,
	// reverse all the bytes when copying the final state to the output hash.
	for (i = 0; i < 4; ++i) {
		hash[i]      = (ctx->state[0] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 4]  = (ctx->state[1] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 8]  = (ctx->state[2] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000ff;
		hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000ff;
	}
}

#ifdef __cplusplus
}
#endif


================================================
FILE: u2f/sha256.h
================================================
/*********************************************************************
* Filename:   sha256.h
* Author:     Brad Conte (brad AT bradconte.com)
* Copyright:
* Disclaimer: This code is presented "as is" without any guarantees.
* Details:    Defines the API for the corresponding SHA1 implementation.
*********************************************************************/

#ifndef SHA256_H
#define SHA256_H

/*************************** HEADER FILES ***************************/
#include <stddef.h>

/****************************** MACROS ******************************/
#define SHA256_BLOCK_SIZE 32            // SHA256 outputs a 32 byte digest


/**************************** DATA TYPES ****************************/
typedef unsigned char BYTE;             // 8-bit byte
typedef unsigned int  WORD;             // 32-bit word, change to "long" for 16-bit machines

typedef struct {
	BYTE data[64];
	WORD datalen;
	unsigned long long bitlen;
	WORD state[8];
} SHA256_CTX;

/*********************** FUNCTION DECLARATIONS **********************/

#ifdef __cplusplus
extern "C"
{
#endif

	
void sha256_init(SHA256_CTX *ctx);
void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len);
void sha256_final(SHA256_CTX *ctx, BYTE hash[]);

#ifdef __cplusplus
}
#endif


#endif   // SHA256_H


================================================
FILE: u2f/types.h
================================================
/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */

#ifndef _UECC_TYPES_H_
#define _UECC_TYPES_H_

#ifndef uECC_PLATFORM
    #if __AVR__
        #define uECC_PLATFORM uECC_avr
    #elif defined(__thumb2__) || defined(_M_ARMT) /* I think MSVC only supports Thumb-2 targets */
        #define uECC_PLATFORM uECC_arm_thumb2
    #elif defined(__thumb__)
        #define uECC_PLATFORM uECC_arm_thumb
    #elif defined(__arm__) || defined(_M_ARM)
        #define uECC_PLATFORM uECC_arm
    #elif defined(__aarch64__)
        #define uECC_PLATFORM uECC_arm64
    #elif defined(__i386__) || defined(_M_IX86) || defined(_X86_) || defined(__I86__)
        #define uECC_PLATFORM uECC_x86
    #elif defined(__amd64__) || defined(_M_X64)
        #define uECC_PLATFORM uECC_x86_64
    #else
        #define uECC_PLATFORM uECC_arch_other
    #endif
#endif

#ifndef uECC_WORD_SIZE
    #if uECC_PLATFORM == uECC_avr
        #define uECC_WORD_SIZE 1
    #elif (uECC_PLATFORM == uECC_x86_64 || uECC_PLATFORM == uECC_arm64)
        #define uECC_WORD_SIZE 8
    #else
        #define uECC_WORD_SIZE 4
    #endif
#endif

#if (uECC_WORD_SIZE != 1) && (uECC_WORD_SIZE != 4) && (uECC_WORD_SIZE != 8)
    #error "Unsupported value for uECC_WORD_SIZE"
#endif

#if ((uECC_PLATFORM == uECC_avr) && (uECC_WORD_SIZE != 1))
    #pragma message ("uECC_WORD_SIZE must be 1 for AVR")
    #undef uECC_WORD_SIZE
    #define uECC_WORD_SIZE 1
#endif

#if ((uECC_PLATFORM == uECC_arm || uECC_PLATFORM == uECC_arm_thumb || \
        uECC_PLATFORM ==  uECC_arm_thumb2) && \
     (uECC_WORD_SIZE != 4))
    #pragma message ("uECC_WORD_SIZE must be 4 for ARM")
    #undef uECC_WORD_SIZE
    #define uECC_WORD_SIZE 4
#endif

#if defined(__SIZEOF_INT128__) || ((__clang_major__ * 100 + __clang_minor__) >= 302)
    #define SUPPORTS_INT128 1
#else
    #define SUPPORTS_INT128 0
#endif

typedef int8_t wordcount_t;
typedef int16_t bitcount_t;
typedef int8_t cmpresult_t;

#if (uECC_WORD_SIZE == 1)

typedef uint8_t uECC_word_t;
typedef uint16_t uECC_dword_t;

#define HIGH_BIT_SET 0x80
#define uECC_WORD_BITS 8
#define uECC_WORD_BITS_SHIFT 3
#define uECC_WORD_BITS_MASK 0x07

#elif (uECC_WORD_SIZE == 4)

typedef uint32_t uECC_word_t;
typedef uint64_t uECC_dword_t;

#define HIGH_BIT_SET 0x80000000
#define uECC_WORD_BITS 32
#define uECC_WORD_BITS_SHIFT 5
#define uECC_WORD_BITS_MASK 0x01F

#elif (uECC_WORD_SIZE == 8)

typedef uint64_t uECC_word_t;
#if SUPPORTS_INT128
typedef unsigned __int128 uECC_dword_t;
#endif

#define HIGH_BIT_SET 0x8000000000000000ull
#define uECC_WORD_BITS 64
#define uECC_WORD_BITS_SHIFT 6
#define uECC_WORD_BITS_MASK 0x03F

#endif /* uECC_WORD_SIZE */

#endif /* _UECC_TYPES_H_ */


================================================
FILE: u2f/u2f.ino
================================================
#ifndef DESKTOP_TEST
#include <EEPROM.h>
#endif
#include <string.h>

#include "sha256.h"
#include "uECC.h"

#undef DEBUG
#define DEBUG

#define CID_BROADCAST           0xffffffff  // Broadcast channel id

#define TYPE_MASK               0x80  // Frame type mask
#define TYPE_INIT               0x80  // Initial frame identifier
#define TYPE_CONT               0x00  // Continuation frame identifier


#define U2FHID_PING         (TYPE_INIT | 0x01)  // Echo data through local processor only
#define U2FHID_MSG          (TYPE_INIT | 0x03)  // Send U2F message frame
#define U2FHID_LOCK         (TYPE_INIT | 0x04)  // Send lock channel command
#define U2FHID_INIT         (TYPE_INIT | 0x06)  // Channel initialization
#define U2FHID_WINK         (TYPE_INIT | 0x08)  // Send device identification wink
#define U2FHID_ERROR        (TYPE_INIT | 0x3f)  // Error response

// Errors
#define ERR_NONE  0
#define ERR_INVALID_CMD  1
#define ERR_INVALID_PAR  2
#define ERR_INVALID_LEN  3
#define ERR_INVALID_SEQ  4
#define ERR_MSG_TIMEOUT  5
#define ERR_CHANNEL_BUSY  6
#define ERR_LOCK_REQUIRED  10
#define ERR_INVALID_CID  11
#define ERR_OTHER  127

#define U2F_INS_REGISTER  0x01
#define U2F_INS_AUTHENTICATE  0x02
#define U2F_INS_VERSION  0x03


#define STATE_CHANNEL_AVAILABLE 0
#define STATE_CHANNEL_WAIT_PACKET 1
#define STATE_CHANNEL_WAIT_CONT 2
#define STATE_CHANNEL_TIMEOUT 3
#define STATE_LARGE_PACKET 4

#define MAX_TOTAL_PACKET 7609

#define MAX_INITIAL_PACKET 57
#define MAX_CONTINUATION_PACKET 59
#define SET_MSG_LEN(b, v) do { (b)[5] = ((v) >> 8) & 0xff;  (b)[6] = (v) & 0xff; } while(0)


#define U2FHID_IF_VERSION       2  // Current interface implementation version

byte expected_next_packet;
int large_data_len;
int large_data_offset;
byte large_buffer[1024];
byte large_resp_buffer[1024];
byte recv_buffer[64];
byte re
Download .txt
gitextract_0rtdnj14/

├── LICENSE
├── LICENSE-micro-ecc.txt
├── README.md
├── u2f/
│   ├── Makefile.desktop
│   ├── asm_arm.h
│   ├── asm_arm_mult_square.h
│   ├── curve-specific.h
│   ├── desktop_test.cpp
│   ├── platform-specific.h
│   ├── sha256.c
│   ├── sha256.h
│   ├── types.h
│   ├── u2f.ino
│   ├── uECC.c
│   ├── uECC.h
│   └── uECC_vli.h
└── usb_desc.h
Download .txt
SYMBOL INDEX (161 symbols across 10 files)

FILE: u2f/asm_arm.h
  function uECC_VLI_API (line 495) | uECC_VLI_API void uECC_vli_mult(uint32_t *result,
  function uECC_VLI_API (line 824) | uECC_VLI_API uECC_word_t uECC_vli_add(uECC_word_t *result,
  function uECC_VLI_API (line 856) | uECC_VLI_API uECC_word_t uECC_vli_sub(uECC_word_t *result,
  function uECC_VLI_API (line 888) | uECC_VLI_API void uECC_vli_mult(uECC_word_t *result,
  function uECC_VLI_API (line 1058) | uECC_VLI_API void uECC_vli_square(uECC_word_t *result,

FILE: u2f/curve-specific.h
  function double_jacobian_default (line 50) | static void double_jacobian_default(uECC_word_t * X1,
  function x_side_default (line 98) | static void x_side_default(uECC_word_t *result, const uECC_word_t *x, uE...
  function mod_sqrt_default (line 113) | static void mod_sqrt_default(uECC_word_t *a, uECC_Curve curve) {
  type uECC_Curve_t (line 139) | struct uECC_Curve_t
  function uECC_Curve (line 169) | uECC_Curve uECC_secp160r1(void) { return &curve_secp160r1; }
  function vli_mmod_fast_secp160r1 (line 178) | static void vli_mmod_fast_secp160r1(uECC_word_t *result, uECC_word_t *pr...
  function omega_mult_secp160r1 (line 201) | static void omega_mult_secp160r1(uint64_t *result, const uint64_t *right) {
  function vli_mmod_fast_secp160r1 (line 215) | static void vli_mmod_fast_secp160r1(uECC_word_t *result, uECC_word_t *pr...
  function omega_mult_secp160r1 (line 240) | static void omega_mult_secp160r1(uint8_t *result, const uint8_t *right) {
  function omega_mult_secp160r1 (line 257) | static void omega_mult_secp160r1(uint32_t *result, const uint32_t *right) {
  type uECC_Curve_t (line 284) | struct uECC_Curve_t
  function uECC_Curve (line 314) | uECC_Curve uECC_secp192r1(void) { return &curve_secp192r1; }
  function vli_mmod_fast_secp192r1 (line 320) | static void vli_mmod_fast_secp192r1(uint8_t *result, uint8_t *product) {
  function vli_mmod_fast_secp192r1 (line 352) | static void vli_mmod_fast_secp192r1(uint32_t *result, uint32_t *product) {
  function vli_mmod_fast_secp192r1 (line 378) | static void vli_mmod_fast_secp192r1(uint64_t *result, uint64_t *product) {
  type uECC_Curve_t (line 414) | struct uECC_Curve_t
  function uECC_Curve (line 449) | uECC_Curve uECC_secp224r1(void) { return &curve_secp224r1; }
  function mod_sqrt_secp224r1_rs (line 454) | static void mod_sqrt_secp224r1_rs(uECC_word_t *d1,
  function mod_sqrt_secp224r1_rss (line 472) | static void mod_sqrt_secp224r1_rss(uECC_word_t *d1,
  function mod_sqrt_secp224r1_rm (line 490) | static void mod_sqrt_secp224r1_rm(uECC_word_t *d2,
  function mod_sqrt_secp224r1_rp (line 518) | static void mod_sqrt_secp224r1_rp(uECC_word_t *d1,
  function mod_sqrt_secp224r1 (line 544) | static void mod_sqrt_secp224r1(uECC_word_t *a, uECC_Curve curve) {
  function vli_mmod_fast_secp224r1 (line 574) | static void vli_mmod_fast_secp224r1(uint8_t *result, uint8_t *product) {
  function vli_mmod_fast_secp224r1 (line 629) | static void vli_mmod_fast_secp224r1(uint32_t *result, uint32_t *product)
  function vli_mmod_fast_secp224r1 (line 680) | static void vli_mmod_fast_secp224r1(uint64_t *result, uint64_t *product)
  type uECC_Curve_t (line 736) | struct uECC_Curve_t
  function uECC_Curve (line 771) | uECC_Curve uECC_secp256r1(void) { return &curve_secp256r1; }
  function vli_mmod_fast_secp256r1 (line 778) | static void vli_mmod_fast_secp256r1(uint8_t *result, uint8_t *product) {
  function vli_mmod_fast_secp256r1 (line 883) | static void vli_mmod_fast_secp256r1(uint32_t *result, uint32_t *product) {
  function vli_mmod_fast_secp256r1 (line 981) | static void vli_mmod_fast_secp256r1(uint64_t *result, uint64_t *product) {
  type uECC_Curve_t (line 1071) | struct uECC_Curve_t
  function uECC_Curve (line 1106) | uECC_Curve uECC_secp256k1(void) { return &curve_secp256k1; }
  function double_jacobian_secp256k1 (line 1110) | static void double_jacobian_secp256k1(uECC_word_t * X1,
  function x_side_secp256k1 (line 1149) | static void x_side_secp256k1(uECC_word_t *result, const uECC_word_t *x, ...
  function vli_mmod_fast_secp256k1 (line 1157) | static void vli_mmod_fast_secp256k1(uECC_word_t *result, uECC_word_t *pr...
  function omega_mult_secp256k1 (line 1181) | static void omega_mult_secp256k1(uint8_t * result, const uint8_t * right) {
  function omega_mult_secp256k1 (line 1211) | static void omega_mult_secp256k1(uint32_t * result, const uint32_t * rig...
  function omega_mult_secp256k1 (line 1227) | static void omega_mult_secp256k1(uint64_t * result, const uint64_t * rig...

FILE: u2f/desktop_test.cpp
  type OUTPUT_FORMAT_ENUM (line 32) | enum OUTPUT_FORMAT_ENUM
  function hexchar2int (line 40) | int hexchar2int(int c)
  function hex2bytes (line 47) | void hex2bytes(const std::string & inp, unsigned char **res, int *len)
  function get_next_fake_input (line 70) | int get_next_fake_input(unsigned char **res)
  function read_file (line 81) | void read_file(const std::string & filename)
  function RNG (line 92) | int RNG(uint8_t *dest, unsigned size)
  function system_millis (line 100) | long system_millis()
  function millis (line 123) | int millis()
  function delayMicroseconds (line 129) | void delayMicroseconds(int micro)
  class EEPROMClass (line 134) | class EEPROMClass {
    method get (line 137) | void get(int address, unsigned int &value)
    method put (line 141) | void put(int address, int value)
  class SerialClass (line 147) | class SerialClass {
    method begin (line 150) | void begin(int speed) {
    method print (line 152) | void print(const char *msg)
    method println (line 156) | void println() {
    method println (line 159) | void println(const char *msg)
    method println (line 163) | void println(int number)
    method print (line 167) | void print(int number, OUTPUT_FORMAT_ENUM e)
    method println (line 171) | void println(int number, OUTPUT_FORMAT_ENUM e)
  class RawHIDClass (line 177) | class RawHIDClass {
    method send (line 179) | void send(byte *buffer, int to)
    method recv (line 187) | int recv(byte *buffer, int timeout)
  function main (line 213) | int main(int argc, char *argv[])

FILE: u2f/platform-specific.h
  function default_RNG (line 15) | static int default_RNG(uint8_t *dest, unsigned size) {
  function default_RNG (line 39) | static int default_RNG(uint8_t *dest, unsigned size) {

FILE: u2f/sha256.c
  function sha256_transform (line 51) | void sha256_transform(SHA256_CTX *ctx, const BYTE data[])
  function sha256_init (line 92) | void sha256_init(SHA256_CTX *ctx)
  function sha256_update (line 106) | void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len)
  function sha256_final (line 121) | void sha256_final(SHA256_CTX *ctx, BYTE hash[])

FILE: u2f/sha256.h
  type BYTE (line 20) | typedef unsigned char BYTE;
  type WORD (line 21) | typedef unsigned int  WORD;
  type SHA256_CTX (line 23) | typedef struct {

FILE: u2f/types.h
  type wordcount_t (line 60) | typedef int8_t wordcount_t;
  type bitcount_t (line 61) | typedef int16_t bitcount_t;
  type cmpresult_t (line 62) | typedef int8_t cmpresult_t;
  type uECC_word_t (line 66) | typedef uint8_t uECC_word_t;
  type uECC_dword_t (line 67) | typedef uint16_t uECC_dword_t;
  type uECC_word_t (line 76) | typedef uint32_t uECC_word_t;
  type uECC_dword_t (line 77) | typedef uint64_t uECC_dword_t;
  type uECC_word_t (line 86) | typedef uint64_t uECC_word_t;
  type uECC_dword_t (line 88) | typedef unsigned __int128 uECC_dword_t;

FILE: u2f/uECC.c
  type uECC_Curve_t (line 139) | struct uECC_Curve_t {
  function uECC_set_rng (line 179) | void uECC_set_rng(uECC_RNG_Function rng_function) {
  function uECC_VLI_API (line 184) | uECC_VLI_API void uECC_vli_clear(uECC_word_t *vli, wordcount_t num_words) {
  function uECC_VLI_API (line 194) | uECC_VLI_API uECC_word_t uECC_vli_isZero(const uECC_word_t *vli, wordcou...
  function uECC_VLI_API (line 204) | uECC_VLI_API uECC_word_t uECC_vli_testBit(const uECC_word_t *vli, bitcou...
  function wordcount_t (line 209) | static wordcount_t vli_numDigits(const uECC_word_t *vli, const wordcount...
  function uECC_VLI_API (line 220) | uECC_VLI_API bitcount_t uECC_vli_numBits(const uECC_word_t *vli, const w...
  function uECC_VLI_API (line 239) | uECC_VLI_API void uECC_vli_set(uECC_word_t *dest, const uECC_word_t *src...
  function cmpresult_t (line 248) | static cmpresult_t uECC_vli_cmp_unsafe(const uECC_word_t *left,
  function uECC_VLI_API (line 264) | uECC_VLI_API uECC_word_t uECC_vli_equal(const uECC_word_t *left,
  function uECC_VLI_API (line 281) | uECC_VLI_API cmpresult_t uECC_vli_cmp(const uECC_word_t *left,
  function uECC_VLI_API (line 292) | uECC_VLI_API void uECC_vli_rshift1(uECC_word_t *vli, wordcount_t num_wor...
  function uECC_VLI_API (line 307) | uECC_VLI_API uECC_word_t uECC_vli_add(uECC_word_t *result,
  function uECC_VLI_API (line 326) | uECC_VLI_API uECC_word_t uECC_vli_sub(uECC_word_t *result,
  function muladd (line 346) | static void muladd(uECC_word_t a,
  function uECC_VLI_API (line 388) | uECC_VLI_API void uECC_vli_mult(uECC_word_t *result,
  function mul2add (line 423) | static void mul2add(uECC_word_t a,
  function uECC_VLI_API (line 470) | uECC_VLI_API void uECC_vli_square(uECC_word_t *result,
  function uECC_VLI_API (line 501) | uECC_VLI_API void uECC_vli_square(uECC_word_t *result,
  function uECC_VLI_API (line 512) | uECC_VLI_API void uECC_vli_modAdd(uECC_word_t *result,
  function uECC_VLI_API (line 526) | uECC_VLI_API void uECC_vli_modSub(uECC_word_t *result,
  function uECC_VLI_API (line 541) | uECC_VLI_API void uECC_vli_mmod(uECC_word_t *result,
  function uECC_VLI_API (line 584) | uECC_VLI_API void uECC_vli_modMult(uECC_word_t *result,
  function uECC_VLI_API (line 594) | uECC_VLI_API void uECC_vli_modMult_fast(uECC_word_t *result,
  function uECC_VLI_API (line 611) | uECC_VLI_API void uECC_vli_modSquare(uECC_word_t *result,
  function uECC_VLI_API (line 621) | uECC_VLI_API void uECC_vli_modSquare_fast(uECC_word_t *result,
  function uECC_VLI_API (line 636) | uECC_VLI_API void uECC_vli_modSquare(uECC_word_t *result,
  function uECC_VLI_API (line 644) | uECC_VLI_API void uECC_vli_modSquare_fast(uECC_word_t *result,
  function vli_modInv_update (line 653) | static void vli_modInv_update(uECC_word_t *uv,
  function uECC_VLI_API (line 668) | uECC_VLI_API void uECC_vli_modInv(uECC_word_t *result,
  function apply_z (line 725) | static void apply_z(uECC_word_t * X1,
  function XYcZ_initial_double (line 738) | static void XYcZ_initial_double(uECC_word_t * X1,
  function XYcZ_add (line 765) | static void XYcZ_add(uECC_word_t * X1,
  function XYcZ_addC (line 796) | static void XYcZ_addC(uECC_word_t * X1,
  function EccPoint_mult (line 834) | static void EccPoint_mult(uECC_word_t * result,
  function uECC_word_t (line 879) | static uECC_word_t regularize_k(const uECC_word_t * const k,
  function uECC_word_t (line 892) | static uECC_word_t EccPoint_compute_public_key(uECC_word_t *result,
  function uECC_VLI_API (line 914) | uECC_VLI_API void uECC_vli_nativeToBytes(uint8_t *bytes,
  function uECC_VLI_API (line 923) | uECC_VLI_API void uECC_vli_bytesToNative(uint8_t *native,
  function uECC_VLI_API (line 931) | uECC_VLI_API void uECC_vli_nativeToBytes(uint8_t *bytes,
  function uECC_VLI_API (line 941) | uECC_VLI_API void uECC_vli_bytesToNative(uECC_word_t *native,
  function uECC_VLI_API (line 957) | uECC_VLI_API int uECC_generate_random_int(uECC_word_t *random,
  function uECC_make_key (line 981) | int uECC_make_key(uint8_t *public_key,
  function uECC_shared_secret (line 1004) | int uECC_shared_secret(const uint8_t *public_key,
  function uECC_compress (line 1040) | void uECC_compress(const uint8_t *public_key, uint8_t *compressed, uECC_...
  function uECC_decompress (line 1048) | void uECC_decompress(const uint8_t *compressed, uint8_t *public_key, uEC...
  function uECC_valid_point (line 1064) | int uECC_valid_point(const uECC_word_t *point, uECC_Curve curve) {
  function uECC_valid_public_key (line 1087) | int uECC_valid_public_key(const uint8_t *public_key, uECC_Curve curve) {
  function uECC_compute_public_key (line 1096) | int uECC_compute_public_key(const uint8_t *private_key, uint8_t *public_...
  function bits2int (line 1125) | static void bits2int(uECC_word_t *native,
  function uECC_sign_with_k (line 1154) | static int uECC_sign_with_k(const uint8_t *private_key,
  function uECC_sign (line 1212) | int uECC_sign(const uint8_t *private_key,
  function HMAC_init (line 1234) | static void HMAC_init(uECC_HashContext *hash_context, const uint8_t *K) {
  function HMAC_update (line 1246) | static void HMAC_update(uECC_HashContext *hash_context,
  function HMAC_finish (line 1252) | static void HMAC_finish(uECC_HashContext *hash_context, const uint8_t *K...
  function update_V (line 1269) | static void update_V(uECC_HashContext *hash_context, uint8_t *K, uint8_t...
  function uECC_sign_deterministic (line 1281) | int uECC_sign_deterministic(const uint8_t *private_key,
  function bitcount_t (line 1354) | static bitcount_t smax(bitcount_t a, bitcount_t b) {
  function uECC_verify (line 1358) | int uECC_verify(const uint8_t *public_key,
  function uECC_curve_num_words (line 1463) | unsigned uECC_curve_num_words(uECC_Curve curve) {
  function uECC_curve_num_bytes (line 1467) | unsigned uECC_curve_num_bytes(uECC_Curve curve) {
  function uECC_curve_num_bits (line 1471) | unsigned uECC_curve_num_bits(uECC_Curve curve) {
  function uECC_curve_num_n_words (line 1475) | unsigned uECC_curve_num_n_words(uECC_Curve curve) {
  function uECC_curve_num_n_bytes (line 1479) | unsigned uECC_curve_num_n_bytes(uECC_Curve curve) {
  function uECC_curve_num_n_bits (line 1483) | unsigned uECC_curve_num_n_bits(uECC_Curve curve) {
  function uECC_word_t (line 1487) | const uECC_word_t *uECC_curve_p(uECC_Curve curve) {
  function uECC_word_t (line 1491) | const uECC_word_t *uECC_curve_n(uECC_Curve curve) {
  function uECC_word_t (line 1495) | const uECC_word_t *uECC_curve_G(uECC_Curve curve) {
  function uECC_word_t (line 1499) | const uECC_word_t *uECC_curve_b(uECC_Curve curve) {
  function uECC_vli_mod_sqrt (line 1504) | void uECC_vli_mod_sqrt(uECC_word_t *a, uECC_Curve curve) {
  function uECC_vli_mmod_fast (line 1509) | void uECC_vli_mmod_fast(uECC_word_t *result, uECC_word_t *product, uECC_...
  function uECC_point_mult (line 1517) | void uECC_point_mult(uECC_word_t *result,

FILE: u2f/uECC.h
  type uECC_Curve_t (line 61) | struct uECC_Curve_t
  type uECC_Curve_t (line 62) | struct uECC_Curve_t
  type uECC_HashContext (line 267) | typedef struct uECC_HashContext {

FILE: usb_desc.h
  type usb_descriptor_list_t (line 317) | typedef struct {
Condensed preview — 17 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (330K chars).
[
  {
    "path": "LICENSE",
    "chars": 1299,
    "preview": "Copyright (c) 2015, Yohanes Nugroho\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or wit"
  },
  {
    "path": "LICENSE-micro-ecc.txt",
    "chars": 1300,
    "preview": "Copyright (c) 2014, Kenneth MacKay\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or with"
  },
  {
    "path": "README.md",
    "chars": 1049,
    "preview": "teensy-u2f\n==========\n\nU2F implementation for Teensy LC. \n\nThis implementation is simple, works, but a bit insecure in t"
  },
  {
    "path": "u2f/Makefile.desktop",
    "chars": 208,
    "preview": "\nall: desktop_test\n\nuECC.o\t: uECC.c\n\tgcc -Wall -c uECC.c\n\ndesktop_test: desktop_test.cpp sha256.c u2f.ino uECC.o\n\t      "
  },
  {
    "path": "u2f/asm_arm.h",
    "chars": 51389,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_ASM_ARM_H_\n#define _UECC_A"
  },
  {
    "path": "u2f/asm_arm_mult_square.h",
    "chars": 75464,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_ASM_ARM_MULT_SQUARE_H_\n#de"
  },
  {
    "path": "u2f/curve-specific.h",
    "chars": 50915,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_CURVE_SPECIFIC_H_\n#define "
  },
  {
    "path": "u2f/desktop_test.cpp",
    "chars": 3999,
    "preview": "//to prevent arduino IDE from compiling this\n#ifdef IS_DESKTOP_TEST\n\n//test in desktop\n#define _POSIX_C_SOURCE 200809L\n\n"
  },
  {
    "path": "u2f/platform-specific.h",
    "chars": 1621,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_PLATFORM_SPECIFIC_H_\n#defi"
  },
  {
    "path": "u2f/sha256.c",
    "chars": 5355,
    "preview": "/*********************************************************************\n* Filename:   sha256.c\n* Author:     Brad Conte ("
  },
  {
    "path": "u2f/sha256.h",
    "chars": 1289,
    "preview": "/*********************************************************************\n* Filename:   sha256.h\n* Author:     Brad Conte ("
  },
  {
    "path": "u2f/types.h",
    "chars": 2690,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_TYPES_H_\n#define _UECC_TYP"
  },
  {
    "path": "u2f/u2f.ino",
    "chars": 24758,
    "preview": "#ifndef DESKTOP_TEST\n#include <EEPROM.h>\n#endif\n#include <string.h>\n\n#include \"sha256.h\"\n#include \"uECC.h\"\n\n#undef DEBUG"
  },
  {
    "path": "u2f/uECC.c",
    "chars": 53074,
    "preview": "/* Copyright 2014, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#include \"uECC.h\"\n#include \"uECC_vli.h\"\n"
  },
  {
    "path": "u2f/uECC.h",
    "chars": 12128,
    "preview": "/* Copyright 2014, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_H_\n#define _UECC_H_\n\n#incl"
  },
  {
    "path": "u2f/uECC_vli.h",
    "chars": 6927,
    "preview": "/* Copyright 2015, Kenneth MacKay. Licensed under the BSD 2-clause license. */\n\n#ifndef _UECC_VLI_H_\n#define _UECC_VLI_H"
  },
  {
    "path": "usb_desc.h",
    "chars": 12924,
    "preview": "/* Teensyduino Core Library\n * http://www.pjrc.com/teensy/\n * Copyright (c) 2013 PJRC.COM, LLC.\n *\n * Permission is here"
  }
]

About this extraction

This page contains the full source code of the yohanes/teensy-u2f GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 17 files (299.2 KB), approximately 105.9k tokens, and a symbol index with 161 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!