Update on Overleaf.

2025-12-08 05:27:59 +00:00 · 2025-07-15 15:59:23 +00:00
parent ab40f6e909
commit e702a2aa6f
11 changed files with 864 additions and 723 deletions
--- a/Chapters/Implementation.tex
+++ b/Chapters/Implementation.tex
@@ -24,6 +24,7 @@
 % - in the following sections i will go into details on how each implementation work

 The primary goal of this thesis is to conduct a security analysis of commercial \gls{esim} implementations through differential testing. We adopt a systematic approach to compare the behavior of different \gls{euicc} implementations under identical inputs to uncover inconsistencies and potential vulnerabilities. Our focus lies particularly on components and behaviors that differentiate traditional \gls{sim} cards from \glspl{esim}, such as profile download and profile mangement capabilites.
+\marginpar{Implementation focus is on behaviors unique to eSIMs, like profile download and management.}

 To perform differential testing, we designed a structured fuzzing methodology that employs both valid and mutated \gls{apdu} sequences. By observing and comparing how multiple \glspl{euicc} respond to the same inputs, we aim to uncover deviations that may indicate security flaws or implementation weaknesses.

@@ -41,9 +42,10 @@ To perform differential testing, we designed a structured fuzzing methodology th
 % - recording: represents a list of recorded \glspl{apdu}, handles source and target isd-r addresses, file saving and loding as well as checking if the file is replayable
 % - replay: establishes connection to pcsc via pcsc link, loads recorded \glspl{apdu} and sends them over the link to the connected euicc, switches out source isd-r and target isd-r during replay, compares response status word to recorded status word on prints an error if there is a difference

-We built the tracing component based on Design 1 in \cref{subsec:design_1} to capture and interpret \glspl{apdu} exchanged between an \gls{lpa} (or other source) and the \gls{euicc}, and to replay them by inserting the recorded \glspl{apdu} into the communication between the \gls{lpa} and the \gls{euicc}. This forms the foundation of the differential testing framework by allowing the same interaction sequence to be executed across multiple \glspl{euicc} for behavioral comparison.
+We build the tracing component based on Design 1 in \cref{subsec:design_1} to capture and interpret \glspl{apdu} exchanged between an \gls{lpa} (or other source) and the \gls{euicc}, and to replay them by inserting the recorded \glspl{apdu} into the communication between the \gls{lpa} and the \gls{euicc}. This forms the foundation of the differential testing framework by allowing the same interaction sequence to be executed across multiple \glspl{euicc} for behavioral comparison.

 Our tracing functionality comprises two main operations:
+\marginpar{Inject recorded APDUs into a card reader session, adjusting IDs and checking responses.}

 \begin{itemize}
    \item \textbf{Tracing and recording:} Captures \glspl{apdu} traffic from a physical interface using \texttt{simtrace2}~\cite{osmocom_simtrace_nodate} and associates it with functional interpretations (e.g., profile enablement, deletion). The \glspl{apdu} are parsed and stored along with contextual information such as sender and receiver addresses.
@@ -65,8 +67,10 @@ The implementation consists of several key components as shown in \cref{img:clas
    \item[\texttt{Card}] Represents a connected card in a PC/SC reader. It queries the card to determine its type (e.g., standard \gls{sim}, test \gls{euicc}, or commercial \gls{euicc}), and identifies installed applications such as \gls{isdr} or \gls{ecasd}. The class serves as the interface for sending \glspl{apdu} to the card through the \texttt{pcsc\_link}.

    \item[\texttt{Tracer}] A dummy implementation of the \texttt{Card} interface used during passive tracing. It parses incoming \glspl{apdu} from the GSMTAP interface using \texttt{pysim} and attempts to classify them based on instruction type. This allows mapping observed \glspl{apdu} to functional operations.
+    \marginpar{Card, Tracer, and Recorder handle active interaction, passive capture, and session logging for eUICC testing.}

    \item[\texttt{Recorder}] Coordinates tracing and recording. It spawns a separate tracer thread that listens for \glspl{apdu} from GSMTAP in a loop until a timeout occurs or a stop signal is issued. \glspl{apdu} are recorded alongside the designated target \gls{isdr} for later analysis.
+    \marginpar{Recording and Replayer store traced sessions and replay them to detect behavioral differences across eUICCs.}

    \item[\texttt{recording}] An abstraction for a recorded session. It stores the list of \glspl{apdu}, associated source and target \texttt{\gls{isdr}} addresses, and metadata. It provides serialization functions for saving to and loading from disk, as well as validity checks to determine whether a recording is replayable.

@@ -148,7 +152,7 @@ This modular structure allows for easy integration into both automated test pipe
 % before returning the data to the caller -> client checks for error on server and eventually raises the corresponding exception -> as explained in the exception handling part
 % smdp+ client is mostly used by the isd-r

-Due to the inability of the \texttt{tracer} implementation to accurately replay \gls{rsp} interactions, we developed a dedicated \gls{lpa} to initiate valid interactions with the \gls{euicc}. This custom \gls{lpa} provides us with full control over the generation and mutation of traffic, enabling structured and repeatable interaction patterns. We describe the mutation and fuzzing strategies enabled by this setup in detail in \cref{sec:fuzzing}. Our implementation specifically targets the SGP.22 v3.1 specification, which, at the time of writing, represented the most recent version available~\cite{gsma_sgp22_2025}.
+Due to the inability of the \texttt{tracer} implementation to accurately replay \gls{rsp} interactions, we developed a dedicated \gls{lpa} to initiate valid interactions with the \gls{euicc}.\marginpar{Custom LPA enables precise and controlled RSP interactions with eUICCs.} This custom \gls{lpa} provides us with full control over the generation and mutation of traffic, enabling structured and repeatable interaction patterns. We describe the mutation and fuzzing strategies enabled by this setup in detail in \cref{sec:fuzzing}. Our implementation specifically targets the SGP.22 v3.1 specification, which, at the time of writing, represented the most recent version available~\cite{gsma_sgp22_2025}.

 The \gls{lpa} is composed of multiple components:

@@ -156,15 +160,17 @@ The \gls{lpa} is composed of multiple components:
 Represents the \gls{euicc} currently inserted into the PC/SC card reader. Upon initialization, it scans the card for supported applications, identifying the applicable \gls{adf} through probing. This is necessary as eSIM-on-SIM implementations often use proprietary \glspl{adf}, diverging from the \glspl{adf} specified in the SGP.22 standard as we will evaluate in \cref{sec:eval_tracing}. The card object keeps track of the selected application to reduce unnecessary reselection and traffic.

 \paragraph{PC/SC Link.}
+\marginpar{PC/SC Link maintains session and supports APDU mutation.}
 This component is based on \texttt{pySim}'s \texttt{LinkBaseTpdu}. It establishes an exclusive connection to the PC/SC reader to maintain session state consistency, which is required due to the stateful nature of \gls{euicc} interactions. During initialization:
 \begin{itemize}
  \item The supported transmission protocol (T=0 or T=1) is detected.
  \item A connection is established and validated.
 \end{itemize}
-It handles both \gls{apdu} and \gls{tpdu} transmission, automatically requesting additional data when status words such as \texttt{9FXX}, \texttt{61XX}, \texttt{62XX}, or \texttt{63XX} are encountered. When enabled, it invokes an optional mutation engine before sending \glspl{apdu} (see \cref{subsec:apdu_fuzzing}) and also records all traffic for later analysis.
+It handles both \gls{apdu} and \gls{tpdu} transmission, automatically requesting additional data when status words such as \texttt{9FXX}, \texttt{61XX}, \texttt{62XX}, or \texttt{63XX} are encountered as further detailed in \cref{sec:sw_codes}. When enabled, it invokes an optional mutation engine before sending \glspl{apdu} (see \cref{subsec:apdu_fuzzing}) and also records all traffic for later analysis.

 \paragraph{Application.}
 Each \gls{euicc} application (\eg, \gls{isdr}, \gls{ecasd}) is implemented with application-specific logic and communicates with the card via the \texttt{pcsc\_link}. The application layer abstracts encoding/decoding and command sending. For instance, the \texttt{store\_data} command is handled internally using \texttt{asn1tools} for encoding and decoding.
+\marginpar{Application represents eUICC applet, encode commands, and decodes responses.}

 Known \glspl{adf} for \gls{isdr} observed during analysis:
 \begin{itemize}
@@ -179,10 +185,10 @@ To decoded response data for further processing, we use \texttt{pydantic} data c
 The \texttt{estk\_fwupd} application implements a proprietary firmware update interface, which we reverse-engineered (see \cref{sec:eval_tracing}). It supports reading the current firmware version, unlocking\footnote{This unlocking is distinct from \gls{gp}-defined unlocking, which allows the execution of generic \gls{gp} commands. See \gls{gp} Card Specification \cite{globalplatform_gp_2018}.} the \gls{euicc} for updates, and installing new binaries.

 \paragraph{Exception Handling.}
-The SGP.22 standard defines a variety of response codes and error conditions. We map these response codes to custom exception classes in the \gls{lpa} implementation to enable precise error handling. This is essential for both debugging and for the differential testing framework to reason about diverging behavior across implementations. A code listing of the exception handling mappings is provided in \cref{sec:exception-handling}.
+The SGP.22 standard defines a variety of response codes and error conditions.\marginpar{Custom exceptions provide precise handling of SGP.22-defined response codes.} We map these response codes to custom exception classes in the \gls{lpa} implementation to enable precise error handling. This is essential for both debugging and for the differential testing framework to reason about diverging behavior across implementations. A code listing of the exception handling mappings is provided in \cref{sec:exception-handling}.

 \paragraph{SM-DP+ Client.}
-In addition to \gls{euicc} communication, the \gls{lpa} implementation must interact with the \gls{smdpp} server via the ES9+ interface. Our implementation uses \texttt{httpx} for HTTP interactions and adheres to the expected headers and structure as defined by SGP.22:
+In addition to \gls{euicc} communication, the \gls{lpa} implementation must interact with the \gls{smdpp} server via the ES9+ interface as shown in \cref{img:rsp_architecture}. Our implementation uses \texttt{httpx} for HTTP interactions and adheres to the expected headers and structure as defined by SGP.22:
 \begin{lstlisting}[language=json,caption={ES9+ Request Headers}]
 {
  "Content-Type": "application/json",
@@ -190,6 +196,7 @@ In addition to \gls{euicc} communication, the \gls{lpa} implementation must inte
  "X-Admin-Protocol": "gsma/rsp/v3.1.0"
 }
 \end{lstlisting}
+\marginpar{Deserialization of responses and structured error handling using SM-DP+ client.}

 We encode payload values in Base64 format, as mandated by the specification. To process server responses, we deserialize the returned data using custom \texttt{pydantic} data classes that model the expected structure. In the event of an error response, our implementation raises the appropriate exception, following the error-handling logic outlined in the previous section.

@@ -198,6 +205,8 @@ The \gls{smdpp} client is primarily used by our \gls{isdr} application to execut
 \section{Fuzzing}
 \label{sec:fuzzing}

+\todo{Section summary}
+
 \subsection{APDU Fuzzing}
 \label{subsec:apdu_fuzzing}

@@ -281,7 +290,7 @@ To uncover behavioral differences between \gls{euicc} implementations, we implem

 \paragraph{Fuzzing Scenarios and Execution.}

-We perform fuzzing through predefined \emph{scenarios}, which consist of ordered sequences of function calls targeting the \gls{euicc}. Each function within a scenario is executed via our custom \gls{lpa} implementation and serves as a potential mutation point. To ensure a consistent test environment, the scenario runner establishes a fresh PC/SC connection and resets the card into a clean state by invoking the \texttt{eUICCMemoryReset} operation. This includes processing all pending notifications and performing a full memory wipe prior to execution.
+We perform fuzzing through predefined \emph{scenarios}, which consist of ordered sequences of function calls targeting the \gls{euicc}. Each function within a scenario is executed via our custom \gls{lpa} implementation and serves as a potential mutation point.\marginpar{Scenarios define structured test sequences where each function is a mutation target.} To ensure a consistent test environment, the scenario runner establishes a fresh PC/SC connection and resets the card into a clean state by invoking the \texttt{eUICCMemoryReset} operation. This includes processing all pending notifications and performing a full memory wipe prior to execution.

 To systematically track the fuzzing process, we developed an \textbf{operation recorder} that tracks every function invocation, the applied mutations, and the corresponding responses. This data is structured as a hierarchical \emph{mutation tree}, where each node represents a function call with a specific mutation applied. Each level in the tree corresponds to a function in the scenario, while sibling nodes denote alternative mutations of the same function. \cref{img:class_basic} shows how the \textbf{operation recorder} intregrates into \sysname.

@@ -297,15 +306,15 @@ To systematically track the fuzzing process, we developed an \textbf{operation r
 We designed the mutation engine to support both \textit{deterministic} and \textit{random} mutation modes. It implements the following strategies for data transformation:

 \begin{itemize}
-\item \textbf{Bit Flip:} In this strategy, individual bits within the payload are flipped to introduce low-level perturbations. The number of bits to flip is determined by the mutation rate $M$ in proportion to the length $L$ of the payload: $max(1, L \cdot M)$. In deterministic mode, the bit positions are computed using a fixed formula: the byte index $B_I$ is calculated as $(i \cdot 31) \mod L$ and the specific bit to flip within that byte is $(i \cdot 7) \mod 8$ with $i$ indicating the index of the current flip. This approach ensures consistent mutation offsets across runs, thereby facilitating reproducibility.
+\item \textbf{Bit Flip:} In this strategy, individual bits within the payload are flipped to introduce low-level perturbations. The number of bits to flip is determined by the mutation rate $M$ in proportion to the length $L$ of the payload: $max(1, L \cdot M)$.\marginpar{Bit Flip alters bits at computed offsets.} In deterministic mode, the bit positions are computed using a fixed formula: the byte index $B_I$ is calculated as $(i \cdot 31) \mod L$ and the specific bit to flip within that byte is $(i \cdot 7) \mod 8$ with $i$ indicating the index of the current flip. This approach ensures consistent mutation offsets across runs, thereby facilitating reproducibility.

-\item \textbf{Random Byte:} This mutation strategy replaces specific bytes with deterministic pseudo-random values. Similar to the bit flip strategy, the number of mutations is derived from the mutation rate $M$ and length $L$ of the payload. The byte index is computed using $(i \cdot 29) \mod L$, and the replacement value is calculated as $(i \cdot 13) \mod 256$ where is the index of the current mutation. Although the name suggests randomness, in deterministic mode these substitutions are reproducible due to the deterministic index-value derivation.
+\item \textbf{Random Byte:} This mutation strategy replaces specific bytes with deterministic pseudo-random values. Similar to the bit flip strategy, the number of mutations is derived from the mutation rate $M$ and length $L$ of the payload.\marginpar{Random Byte replaces bytes using pseudo-random values.} The byte index is computed using $(i \cdot 29) \mod L$, and the replacement value is calculated as $(i \cdot 13) \mod 256$ where is the index of the current mutation. Although the name suggests randomness, in deterministic mode these substitutions are reproducible due to the deterministic index-value derivation.

-\item \textbf{Zero Block:} A contiguous sequence of bytes is replaced with zeroes to simulate data loss or corruption. The mutation engine deterministically selects the starting index as $\lfloor \frac{L}{4} \rfloor \mod \max(1, L - 20)$ and replaces the next ten bytes (up to the end of the data). This method introduces a predictable null block, which is especially useful for observing system behavior under conditions of zeroed memory regions.
+\item \textbf{Zero Block:} A contiguous sequence of bytes is replaced with zeroes to simulate data loss or corruption.\marginpar{Zero Block inserts a null region to simulate data loss.} The mutation engine deterministically selects the starting index as $\lfloor \frac{L}{4} \rfloor \mod \max(1, L - 20)$ and replaces the next ten bytes (up to the end of the data). This method introduces a predictable null block, which is especially useful for observing system behavior under conditions of zeroed memory regions.

-\item \textbf{Shuffle Block:} To alter the structure of the payload while preserving local data, the input is first partitioned into fixed-size blocks (16 bytes each). These blocks are then reordered deterministically based on a checksum-like function, specifically by sorting according to the sum of bytes in each block modulo 256.
+\item \textbf{Shuffle Block:} To alter the structure of the payload while preserving local data,\marginpar{Shuffle Block reorders 16-byte blocks.} the input is first partitioned into fixed-size blocks (16 bytes each). These blocks are then reordered deterministically based on a checksum-like function, specifically by sorting according to the sum of bytes in each block modulo 256.

-\item \textbf{Truncation:} The mutation engine simulates incomplete transmissions or premature message termination by truncating the payload at a fixed ratio. Specifically, the payload is cut at 75\% of its original length. This type of mutation is particularly relevant in fuzzing protocols or parsers that may not handle end-of-stream conditions robustly.
+\item \textbf{Truncation:} The mutation engine simulates incomplete transmissions or premature message termination by truncating the payload at a fixed ratio.\marginpar{Truncation cuts payload at 75\% length.} Specifically, the payload is cut at 75\% of its original length. This type of mutation is particularly relevant in fuzzing protocols or parsers that may not handle end-of-stream conditions robustly.
 \end{itemize}

 Deterministic mode ensures reproducibility by applying mutations at fixed, formula-derived offsets, whereas the random mode selects mutation targets probabilistically at runtime. Both modes behave similar to the deterministic and non-deterministic mutation modes used in AFLPlusPlus~\cite{fioraldi_afl_2020}.
@@ -316,18 +325,20 @@ Deterministic mode ensures reproducibility by applying mutations at fixed, formu
 Figure \cref{fig:scenario_flow} illustrates the \gls{apdu} fuzzing workflow, which we structured into four main steps:

 \begin{enumerate}
-  \item \textbf{Mutation selection:} The operation recorder decides the next mutation to apply based on a depth-first traversal of the mutation tree. If all mutations for the current function are exhausted, the runner searches for unexplored child nodes.
+  \item \textbf{Mutation selection:} The operation recorder decides the next mutation to apply based on a depth-first traversal of the mutation tree. If all mutations for the current function are exhausted, the runner searches for unexplored child nodes.\marginpar{APDU fuzzing mutates scenario functions, sends inputs to eUICCs, and records responses in a tree.}
  \item \textbf{\gls{apdu} mutation:} We apply the selected mutation to the original \gls{apdu} using the mutation engine.
  \item \textbf{\gls{apdu} transmission:} The mutated \gls{apdu} is sent to the \gls{euicc}. We record success or failure in the current mutation tree node.
  \item \textbf{Recording:} We save the response or exception in the corresponding mutation tree node for further analysis.
 \end{enumerate}

 \begin{figure}
-	\centering
-    \input{Graphics/record_scenario_flow.tikz}
-    % \resizebox{\textwidth}{!}{\input{Graphics/record_scenario_flow.tikz}}
-    \caption{Flow for recording a scenario.}
-    \label{fig:scenario_flow}
+        \begin{adjustwidth}{-1.5in}{-.5in} 
+    	\centering
+        \input{Graphics/record_scenario_flow.tikz}
+        % \resizebox{\textwidth}{!}{\input{Graphics/record_scenario_flow.tikz}}
+        \caption{Flow for recording a scenario.}
+        \label{fig:scenario_flow}
+    \end{adjustwidth}
 \end{figure}

 We repeat this process for all functions defined in the scenario, producing a complete mutation tree (see \cref{fig:tree_structure}) that captures all inputs, outputs, and error states.
@@ -350,26 +361,28 @@ We repeat this process for all functions defined in the scenario, producing a co
 % if so return the mutation type of the child that still has not tried mutation types -> brings us on the subtree where a child has not tried mutations -> next time this function is called we return the new not tried mutation type
 % if child does not have any not tried mutations: we return the NoneNode of that child i.e the mutation type of the child that was successfully executed and did not make any mutations. idea: continue down the good path to find untried mutations

-The decision process for selecting the next mutation to apply is a key component of the fuzzing framework and is handled entirely by the \texttt{Operation\-Recorder}. Its responsibility is to ensure that all mutations are eventually applied to each function within a scenario while maintaining a consistent and deterministic traversal order across runs.
+The decision process for selecting the next mutation to apply is a key component of the fuzzing framework and is handled entirely by the \texttt{Operation\-Recorder}.\marginpar{OperationRecorder deterministically selects the next mutation using structured tree traversal.} Its responsibility is to ensure that all mutations are eventually applied to each function within a scenario while maintaining a consistent and deterministic traversal order across runs.

 \begin{figure}
-	\centering
-    \input{Graphics/determine_next_mutation_flow.tikz}
-    \caption{Flow on how to determine the next mutation that should be used.}
-    \label{fig:next_mutation_flow}
+    \begin{adjustwidth}{-1.5in}{-.5in} 
+    	\centering
+        \input{Graphics/determine_next_mutation_flow.tikz}
+        \caption{Flow on how to determine the next mutation that should be used.}
+        \label{fig:next_mutation_flow}
+    \end{adjustwidth}
 \end{figure}

 Our algorithm, illustrated in \cref{fig:next_mutation_flow}, operates based on the current node in the mutation tree. Each node represents a function invocation, and its children represent the same invocation with different mutations. The logic proceeds as follows:

 \begin{enumerate}
  \item \textbf{Check for untried mutations at the current node:}  
-  The recorder checks whether the current node has already created child nodes for every defined mutation type (e.g., bitflip, zero-block, truncate, etc.). If there are untried mutation types, it selects one of them, creates a new child node with that mutation, sets it as the new current node, and returns the selected mutation type.
+  The recorder checks whether the current node has already created\marginpar{Recorder recursively explores child nodes to find untested mutations.} child nodes for every defined mutation type (e.g., bitflip, zero-block, truncate, etc.). If there are untried mutation types, it selects one of them, creates a new child node with that mutation, sets it as the new current node, and returns the selected mutation type.
  
  \item \textbf{Recursive traversal of child nodes:}  
  If all mutation types have already been tried at the current node (i.e., all child mutations are present), the recorder traverses the subtree rooted at each child node. For each child, it checks if there are any untried mutations deeper in the tree.

  \item \textbf{Descent via valid (None) paths:}  
-  If no untried mutations are found among the children, the recorder follows the \texttt{NoneNode} child—representing the unmutated, successful execution of the function. This path is presumed to lead to deeper parts of the tree where further mutations might be unexplored. In essence, this descent along the ``clean'' path enables the system to reach other branches that may still contain untested mutations.
+  If no untried mutations are found among the children, the recorder follows the \texttt{NoneNode} child—representing\marginpar{Fallback to unmutated path enables deeper traversal of untested branches.} the unmutated, successful execution of the function. This path is presumed to lead to deeper parts of the tree where further mutations might be unexplored. In essence, this descent along the ``clean'' path enables the system to reach other branches that may still contain untested mutations.

  \item \textbf{Backtrack or complete:}  
  If the entire subtree from the current node has been fully explored (i.e., all mutations at all levels are exhausted), the recorder signals completion by returning a sentinel (e.g., \texttt{None}) to the scenario runner.
@@ -392,7 +405,7 @@ To preserve fuzzing results, the entire mutation tree is serialized and stored u

 \paragraph{Differential Testing.}

-After multiple cards are fuzzed with the same scenario, their corresponding mutation trees are compared to identify behavioral discrepancies. This is done via depth-first traversal of the trees:
+After multiple cards are fuzzed with the same scenario,\marginpar{Differential testing compares mutation trees across cards to identify behavior mismatches.} their corresponding mutation trees are compared to identify behavioral discrepancies. This is done via depth-first traversal of the trees:

 \begin{itemize}
  \item Trees must have equivalent structure (same function call order and mutation types).
@@ -478,11 +491,11 @@ This differential testing method highlights edge-case inconsistencies across \gl
 % this allows us to test failed input against other cards when running the fuzzing against them -> differential testing

 While APDU-level fuzzing (see \cref{subsec:apdu_fuzzing}) is useful for evaluating command behavior across different \textit{euicc} implementations, it suffers from the drawback that random mutations, particularly at the bit or byte level, often invalidate the structured \gls{asn1} encoding. As a result, many \gls{apdu} mutations are immediately rejected as malformed, limiting the coverage and effectiveness of the test campaign.
-
+\marginpar{Data fuzzing targets high-level LPA inputs to preserve structure while testing edge cases.}
 To address this limitation, we introduce a complementary \textit{data fuzzing} approach based on Design 3 in \cref{subsec:design_3}, that operates at the semantic level by fuzzing the input arguments of high-level \gls{lpa} function calls. This enables us to maintain structural validity while still exercising a wide variety of edge cases in the data provided to the \gls{euicc}. Our implementation builds on property-based testing frameworks designed for Python, in particular the \texttt{hypothesis} library~\cite{maciver_hypothesis_2019}.

 \paragraph{Fuzzing with Hypothesis.}
-Hypothesis is a property-based testing framework, which allows developers to define \textit{strategies} for input data. The framework then generates test cases based on these strategies and attempts to explore edge cases through randomized sampling and shrinking. Unlike traditional random fuzzing, Hypothesis ensures that generated inputs conform to the structural invariants defined by the strategy, thereby increasing the likelihood of discovering subtle logic errors in protocol handling.
+Hypothesis is a property-based testing framework, which allows developers to define \textit{strategies} for input data.\marginpar{Property-based testing with Hypothesis ensures input validity and semantic diversity.} The framework then generates test cases based on these strategies and attempts to explore edge cases through randomized sampling and shrinking. Unlike traditional random fuzzing, Hypothesis ensures that generated inputs conform to the structural invariants defined by the strategy, thereby increasing the likelihood of discovering subtle logic errors in protocol handling.

 Hypothesis integrates seamlessly with \texttt{pytest} and uses the \texttt{@given} decorator to specify input generation strategies. For example, given the \gls{asn1} structure defined in the SGP.22 specification for the \texttt{Get\-Profile\-Info} function:

@@ -523,7 +536,7 @@ def test_get_profiles(self, use_iccid, profile_class, tags):
 This approach preserves the semantics and structure of the expected \gls{asn1} types while still allowing a wide variety of edge cases to be exercised.

 \paragraph{Implementation Scope.}
-Due to reliance on external infrastructure for the \gls{rsp} process, such as the \gls{smdpp} server, our fuzzing campaign focuses exclusively on the \gls{euicc}-side of the \gls{rsp} protocol. Invalid structured fuzzing requests directed at the \gls{smdpp} would lead to excessive traffic and could be misinterpreted as \gls{dos} attempts. Therefore, we restrict our tests to those functions defined in the ES10a, ES10b, and ES10c interfaces of the SGP.22 specification, which form the communication layer between the \gls{lpa} and the \gls{euicc}, specifically focusing on functions that accept structured input arguments and directly interact with the \gls{euicc}.
+Due to reliance on external infrastructure for the \gls{rsp} process, such as the \gls{smdpp} server, our fuzzing campaign focuses exclusively on the \gls{euicc}-side of the \gls{rsp} protocol. Invalid structured fuzzing requests directed at the \gls{smdpp} would lead to excessive traffic and could be misinterpreted as \gls{dos} attempts.\marginpar{Fuzzing is limited to eUICC-side functions to avoid DoS risks on SM-DP+.} Therefore, we restrict our tests to those functions defined in the ES10a, ES10b, and ES10c interfaces of the SGP.22 specification and also shown in \cref{img:rsp_architecture}, which form the communication layer between the \gls{lpa} and the \gls{euicc}, specifically focusing on functions that accept structured input arguments and directly interact with the \gls{euicc}.


 Specifically, we implemented fuzzing tests for the following functions:
@@ -556,14 +569,13 @@ During the \texttt{setUpClass} phase, a PC/SC link is initialized, and the \gls{

 \paragraph{Error Classification.}
 According to the SGP.22 specification, many functions may return a generic \texttt{UndefinedError} in response to unexpected or malformed input. In our implementation, exceptions raised by the \gls{euicc} that map to well-defined error codes (i.e., subclasses of \texttt{EuiccException}) are not treated as test failures. These represent handled errors indicating that the input was invalid but the card responded appropriately.
-
+\marginpar{Well-defined EuiccExceptions indicate valid error handling and are not treated as failures.}
+\todo{Check Undefined Error}
 By contrast, when an \texttt{UndefinedError} is returned, we treat this as a potential indicator of an unhandled internal error or inconsistent implementation behavior. These cases are flagged for further investigation. Additionally, exceptions occurring outside the \gls{euicc}, such as Python \texttt{AssertionError}s or test harness failures, are treated as bugs in the testing infrastructure and are logged separately.

-\todo{Explain how we use differential testing in this context}
-
 \paragraph{Conclusion.}
 By combining property-based data generation with structural knowledge of \gls{asn1} types, we extend the fuzzing coverage of the \gls{euicc} interface beyond what is possible with \gls{apdu} mutation alone. This enables the discovery of semantic inconsistencies and unhandled corner cases in \gls{euicc} implementations, especially when compared across different vendors during differential testing as shown in \cref{sec:data_fuzzing_evaluation}.
-
+\marginpar{Data fuzzing complements APDU mutation by testing semantic correctness of structured inputs.}
 \textit{hypothesis} automatically records any failing test cases to local storage. For each failure, the corresponding input is saved and a unique hash is printed to allow reproducible identification of the triggering input. These previously failing test cases are automatically re-executed during future fuzzing runs prior to generating new test data. This mechanism enables us to efficiently validate whether the same input leads to diverging behavior across different \glspl{euicc}, thereby supporting systematic and automated differential testing.


@@ -593,6 +605,7 @@ By combining property-based data generation with structural knowledge of \gls{as


 While the implemented library provides a programmatic interface to the \gls{lpa} and \gls{euicc} operations, many users, especially testers and engineers, require a more accessible method for interacting with the system. For this reason, we provide a fully-featured \gls{cli} that exposes all major functionalities of the system, including \gls{apdu} tracing, \gls{lpa} operations, and fuzzing workflows.
+\marginpar{CLI offers user-friendly access to tracing, LPA operations, and fuzzing features.}

 The \gls{cli} is built using Python’s standard \texttt{argparse} module for argument parsing, extended with \texttt{argcomplete} to enable shell auto-completion. For improved readability and formatting of terminal output, the \texttt{rich} library is used. This combination allows for an interactive, user-friendly \gls{cli} with both developer ergonomics and production readiness in mind.