gao19building

Title of the paper: Building Secure SRAM PUF Key Generators on Resource Constrained Devices

Available at: https://ieeexplore.ieee.org/abstract/document/8730781

1 Abstract
2 Introduction
3 Goal
4 (Reverse) Fuzzy Extractor
5 Reducing Reverse Fuzzy Extractor Overhead
6 Experimental Validations
- 6.1 SRAM PUF Evaluation
- 6.2 Overhead Evaluation
  - 6.2.1 Testing Setup
  - 6.2.2 Overhead Result and Comparison

Abstract

A securely maintained key is the premise upon which data stored and transmitted by ubiquitously deployed resource limited devices, such as those in the Internet of Things (IoT), are protected. However, many of these devices lack a secure non-volatile memory (NVM) for storing keys because of cost constraints. Silicon physical unclonable functions (PUFs) offering unique device specific secrets to electronic commodities are a low-cost alternative to secure NVM. As a physical hardware security primitive, reliability of a PUF is affected by thermal noise and changes in environmental conditions; consequently, PUF responses cannot be directly employed as cryptographic keys. A fuzzy extractor can turn noisy PUF responses into usable cryptographic keys. However, a fuzzy extractor is not immediately mountable on (highly) resource constrained devices due to its implementation overhead. We present a methodology for constructing a lightweight and secure PUF key generator for resource limited devices. In particular, we focus on PUFs constructed from pervasively embedded SRAM in modern microcontroller units and use a batteryless computational radio frequency identification (CRFID) device as a representative resource constrained IoT device in a case study.

Introduction

It has been shown that securing digital key storage is non-trivial in practice due to e.g., technical or cost limitations [1] → Key derivation with physical unclonable functions
PUFs can replace the functionality of the NVM based keys while increasing the security level of cryptographic key storage using standard CMOS technology based hardware security primitives. Overall, silicon PUFs offer:
1. low fabrication costs [1]
2. uniqueness (an inseparable fingerprint of a hardware instance)
3. enhanced resistance to attacks, especially, invasive attacks [4]–[6]
a PUF based key generator utilizes a fuzzy extractor to turn unstable responses into a stable and uniformly distributed cryptographic key.

Goal

Among the full implementations, majority are hardware implementations on FPGA platforms.

Further, to the best of our knowledge, there is only one end-to-end software PUF key generator implementation on a low power computing platform in the form of a highly resource constrained and batteryless device that relies only on harvested RF energy for operations [22]. → SecuCode: Intrinsic PUF Entangled Secure Wireless Code Dissemination for Computational RFID Devices

Therefore, we aim to provide a simple and timely guide to the Ubiquitous Computing community for realizing PUF based key generators to benefit from their inherent key security.

We consider the use of freely available SRAM on low cost and low power MCUs to function as an intrinsic
PUF to derive a secure and reliable cryptographic key.
We consider a methodology for realizing lightweight implementations of reliable key generators capable of execution on low cost and ultra low power microcontrollers.

(Reverse) Fuzzy Extractor

To turn the raw noisy PUF responses—for example, from an SRAM PUF—into a cryptographic key, a widely accepted approach is to employ a fuzzy extractor.

In general, the secure sketch construction has a pair of functions: Gen() and Rep(). There are two prevalent secure sketch constructions: i) code-offset construction; and ii) syndrome construction [23]. We choose the later, mainly considering its security.

During key enrollment phase, helper data p is computed by using Gen(r), where p = r x H^T and H is a
parity check matrix of a linear error correction code. Key reconstruction described by Rep(r',p)—where r' is the reproduced response that may be slightly different from the enrolled response r—first constructs a syndrome s = (r' x H^T) ⊕ p = e ⊕ H^T, with e an error vector. Then through an error location algorithm, e is determined. Subsequently, the response r is recovered through r = e ⊕ r'.
The recovered PUF response r may not be uniformly distributed. Therefore, an entropy extraction step, using a universal hash function for example, is employed to generate a cryptographic key with full bit entropy.
→ Although the entropy extractor realized with a hash function can be lightweight, the overhead of the secure sketch is comparatively more dominant.

In a fuzzy extractor (FE) setting, the Gen() function is performed by the server during the enrollment phase to compute helper data while the Rep() function is implemented on the field deployed token.

By recognizing that the computational burden of the Rep() function is significantly more than that of Gen() function, Van Herrewege et al. [19] placed Gen() on the resource-constraint token while leaving the computationally heavy Rep() function execution to the resource-rich server; this arrangement is termed the reverse fuzzy extractor (RFE).

→ The authors will focus on the correct recovery in the context of resource limited devices and a RFE where the Gen() function is implemented on the resource limited token in the next sections.

	Gen() function	Rep() function → heavier

	Gen() function	Rep() function → heavier
fuzzy extractor (FE)	on server during enrollment	on resource-constrained device
reverse fuzzy extractor (RFE)	on resource-constrained device	on server

Reducing Reverse Fuzzy Extractor Overhead

1) Response Pre-Processing

The overhead of implementing the Gen() function is directly related to the response unreliability or bit error rate (BER) → noisier responses naturally require a higher error correction overhead.

Therefore, it is imperative to reduce the expected BER of PUF responses on a token. In this context, response pre-processing methodologies, e.g.:

majority voting: given q repeated response measurements under an operating condition, the response value (‘1’/‘0’) enrolled is that which shows no less than floor(q/2) measurements; with q an odd integer.
reliable response pre-selection: each PUF response under an operating condition is repeatedly measured i times, only response bits that exhibit 100% reliable regeneration (all ‘1’s/‘0’s) are selected for use and enrollment.

2) Key Generation with Multiple Reference Responses (MRR)

During the enrollment phase, in contrast to conventional single response enrollment under a nominal operating condition, e.g., room temperature of 25C, the server enrolls J reference responses {r₁, …, r_J} subject to the same challenge c applied to the same PUF but under J different operating conditions.

We can observe that when the difference between the reference operating condition at enrollment increases with respect to the condition during a regenerated response, the the expected BER increases.

(check the article for more details)

Although the server is unable to control the operating condition under which the response r' is regenerated, we can observe that the server now has the capability of choosing one of multiple reference responses to ensure that the chosen reference response is close to the operating condition under which the r' is regenerated.

→ This ability greatly relaxes the error correction requirement and, thus, the implementation overhead of the Gen() function.

3) Key Failure Rate

This study uses the family of BCH(n, k, t) linear codes with a syndrome based decoding strategy to realize a reverse fuzzy extractor considering its popularity and security, where:

n is the codeword length,
k is the code size,
t is the number of errors that can be corrected within this n-bit block.

A BCH(n, k, t) encoding produces (n - k)-bit helper data that will be publicly known while the k bits contribute to the secret.

For a single BCH(n, k, t) block, the complexity of finding the k-bit extracted key from the n-bit response r' is 2^k. It is not common to use a single large BCH(n, k, t) block; typically a large block is split into small processing blocks to reduce implementation complexity. For k bits of key material, response r' can be divided into multiple non-overlapping blocks of a BCH(n1, k1, t1) code where n1 < n and k1 < k for a parallel implementation. Now the complexity of finding the k bit secret is 2^k1L

(check the article for more details)

Experimental Validations

SRAM PUF Evaluation

The SRAM PUF reliability is much less sensitive to voltage variations compared with temperature fluctuations attributing to the SRAM cell’s symmetric structure.

For single reference response enrollment, BER under three different enrollment strategies are applied:

one-time response measurement (none),
majority voting (MV) with 9 repeated measurements (q = 9),
preselection (PreSel) with 10 repeated measurements.

→ The PreSel outperforms others.

For the reference response enrolled under 25C, preselection is applied to select reliable responses. Then reference response under -15C, 0C, 40C and 80C applies response majority voting on the pre-selected reliable responses.

Overhead Evaluation

Testing Setup

The overhead of software implementation is assessed in terms of clock cycles to complete the algorithm.

Two key components of the PUF key generator are the hash function and the BCH code encoding/decoding —Gen()/Rep()—blocks.

Based on the hash function evaluations in [22], we selected BLAKE2s-128 as it showed the best performance, in term of clock cycles.
As for BCH(n1; k1; t1) code, it is imperative to use a small n1 and t1.
- First, the BCH(n1; k1; t1) code overhead decreases sharply when the codeword length n1 decreases.
- Second, a small t1 implies a correspondingly large k1. With a large k1, to achieve the security level of k-bit secret key sk, the needed number of BCH(n1; k1; t1) blocks—L = ciel(k/k₁)—will reduce, thus, reducing the overall overhead to gain k-bit entropy for the secret key sk.

Overhead Result and Comparison

Fuzzy Extractor with Single Reference Response:
To achieve P_fail < 10^-6, nine BCH(127,15,27) blocks are required giving P_fail = 1.66 x 10^-7 when the single reference response under 25C is used. One BCH(127,15,27) block decoding consumes 2,102,222 clock cycles. The fuzzy extractor needs to sequentially execute BCH(127,15,27) decoding blocks 9 times and two BLACKE2s-128 hash operations. In total, it consumes up to 19,127,446 clock cycles.
Reverse Fuzzy Extractor with Single Reference Response:
To achieve P_fail < 10^-6, nine BCH(127,15,27) blocks are required giving P_fail = 1.66 x 10^-7 when the single reference response under 25C is used. Notably, the token only needs to perform the encoding operation requiring significantly less clock cycles in comparison with decoding. One BCH(127,15,27) block encoding consumes 111,335 clock cycles. This reverse fuzzy extractor needs to sequentially execute BCH(127,15,27) encoding 9 times and two BLACKE2s-128 hash operations. In total, it consumes up to 1,211,461 clock cycles.
Reverse Fuzzy Extractor with Three Reference Response:
To achieve P_fail < 10^-6, eight smaller BCH(63,16,11) blocks are adequate to meet the requirement with P_fail = 9.85 x 10^-7. Here, 3MRR under -15C, 25C, 80C are used. One BCH(63,16,11) block encoding consumes 51,003 clock cycles. This reverse fuzzy extractor needs to sequentially execute BCH(63,16,11) encoding 8 times and two BLACKE2s-128 hash operations. In total, it consumes up to 617,470 clock cycles.

The reverse fuzzy extractor overhead is only 6.3% of the fuzzy extractor when a single reference response is deployed to achieve the same key failure rate performance. When the MRR is adopted by the reverse fuzzy extractor, the clock cycle overhead is further reduced by 49% in comparison with a single reference response.

Security Discussion

(Important section: check the article for more details)

The bias of our tested SRAM PUF is 49.87%. Therefore, the extra entropy loss is very small.

An example of employing a HW based debiasing method for key derivation with a resource limited devices is detailed in [22]. → SecuCode: Intrinsic PUF Entangled Secure Wireless Code Dissemination for Computational RFID Devices

Sara's PHD