Update on Overleaf.

2026-02-04 11:07:43 +00:00 · 2025-07-14 23:39:17 +00:00
parent 19fc98c9ac
commit bc0d25ba87
8 changed files with 300 additions and 103 deletions
--- a/Chapters/Implementation.tex
+++ b/Chapters/Implementation.tex
@@ -50,16 +50,14 @@ Our tracing functionality comprises two main operations:
    \item \textbf{Replaying:} Replays previously recorded \glspl{apdu} sequences to an \gls{euicc} in a PC/SC card reader. It replaces context-specific identifiers and checks for discrepancies in response behavior.
 \end{itemize}

-\begin{figure}[h!]
-    \includesvg[width=\textwidth]{Graphics/trace_setup.svg}
-    \caption{Tracing lab setup}
-    \label{img:trace_setup}
-    \todo{Add \sysname onto pc image and reference this figure in text}
+\begin{figure}[t]
+    \centering
+    \includesvg[width=.7\textwidth,inkscapelatex=false]{Graphics/reSIMulate_class_tracer.svg}
+    \caption{Simplified overview of components.}
+    \label{img:class_tracer}
 \end{figure}

-\todo{Overview of software components}
-
-The implementation consists of several key components:
+The implementation consists of several key components as shown in \cref{img:class_tracer}:

 \begin{description}
    \item[\texttt{PcscLink}] A thin wrapper over the Python \texttt{pyscard} library~\cite{rousseau_pyscard_2025}, which abstracts away low-level communication with PC/SC-compatible card readers. It handles session establishment, \glspl{apdu}/\gls{tpdu} transmission, and automatic processing of status words such as \texttt{61XX} (i.e., triggering \texttt{GET RESPONSE} when necessary).
@@ -72,7 +70,7 @@ The implementation consists of several key components:

    \item[\texttt{recording}] An abstraction for a recorded session. It stores the list of \glspl{apdu}, associated source and target \texttt{\gls{isdr}} addresses, and metadata. It provides serialization functions for saving to and loading from disk, as well as validity checks to determine whether a recording is replayable.

-    \item[\texttt{replay}] Loads a saved \texttt{recording}, connects to the target \gls{euicc} via \texttt{PcscLink}, and replays each \glspl{apdu}. During replay, the source and target \gls{isdr} values are automatically substituted. The response status words from the target \gls{euicc} are compared against those from the original trace. Any mismatch is reported to highlight divergent behavior.
+    \item[\texttt{replayer}] Loads a saved \texttt{recording}, connects to the target \gls{euicc} via \texttt{PcscLink}, and replays each \glspl{apdu}. During replay, the source and target \gls{isdr} values are automatically substituted. The response status words from the target \gls{euicc} are compared against those from the original trace. Any mismatch is reported to highlight divergent behavior.
 \end{description}

 This modular structure allows for easy integration into both automated test pipelines and manual inspection tools, and lays the groundwork for both our mutation-based and structure-aware fuzzing techniques described in subsequent sections.
@@ -154,10 +152,10 @@ Due to the inability of the \texttt{tracer} implementation to accurately replay

 The \gls{lpa} is composed of multiple components:

-\paragraph{Card}
+\paragraph{Card.}
 Represents the \gls{euicc} currently inserted into the PC/SC card reader. Upon initialization, it scans the card for supported applications, identifying the applicable \gls{adf} through probing. This is necessary as eSIM-on-SIM implementations often use proprietary \glspl{adf}, diverging from the \glspl{adf} specified in the SGP.22 standard as we will evaluate in \cref{sec:eval_tracing}. The card object keeps track of the selected application to reduce unnecessary reselection and traffic.

-\paragraph{PC/SC Link}
+\paragraph{PC/SC Link.}
 This component is based on \texttt{pySim}'s \texttt{LinkBaseTpdu}. It establishes an exclusive connection to the PC/SC reader to maintain session state consistency, which is required due to the stateful nature of \gls{euicc} interactions. During initialization:
 \begin{itemize}
  \item The supported transmission protocol (T=0 or T=1) is detected.
@@ -165,8 +163,8 @@ This component is based on \texttt{pySim}'s \texttt{LinkBaseTpdu}. It establishe
 \end{itemize}
 It handles both \gls{apdu} and \gls{tpdu} transmission, automatically requesting additional data when status words such as \texttt{9FXX}, \texttt{61XX}, \texttt{62XX}, or \texttt{63XX} are encountered. When enabled, it invokes an optional mutation engine before sending \glspl{apdu} (see \cref{subsec:apdu_fuzzing}) and also records all traffic for later analysis.

-\paragraph{Application}
-Each euicc application (e.g., \gls{isdr}, \gls{ecasd}, ESTK firmware update) is implemented with application-specific logic and communicates with the card via the \texttt{pcsc\_link}. The application layer abstracts encoding/decoding and command sending. For instance, the \texttt{store\_data} command is handled internally using \texttt{asn1tools} for encoding and decoding.
+\paragraph{Application.}
+Each \gls{euicc} application (\eg, \gls{isdr}, \gls{ecasd}) is implemented with application-specific logic and communicates with the card via the \texttt{pcsc\_link}. The application layer abstracts encoding/decoding and command sending. For instance, the \texttt{store\_data} command is handled internally using \texttt{asn1tools} for encoding and decoding.

 Known \glspl{adf} for \gls{isdr} observed during analysis:
 \begin{itemize}
@@ -180,10 +178,10 @@ To decoded response data for further processing, we use \texttt{pydantic} data c

 The \texttt{estk\_fwupd} application implements a proprietary firmware update interface, which we reverse-engineered (see \cref{sec:eval_tracing}). It supports reading the current firmware version, unlocking\footnote{This unlocking is distinct from \gls{gp}-defined unlocking, which allows the execution of generic \gls{gp} commands. See \gls{gp} Card Specification \cite{globalplatform_gp_2018}.} the \gls{euicc} for updates, and installing new binaries.

-\paragraph{Exception Handling}
+\paragraph{Exception Handling.}
 The SGP.22 standard defines a variety of response codes and error conditions. We map these response codes to custom exception classes in the \gls{lpa} implementation to enable precise error handling. This is essential for both debugging and for the differential testing framework to reason about diverging behavior across implementations. A code listing of the exception handling mappings is provided in \cref{sec:exception-handling}.

-\paragraph{SM-DP+ Client}
+\paragraph{SM-DP+ Client.}
 In addition to \gls{euicc} communication, the \gls{lpa} implementation must interact with the \gls{smdpp} server via the ES9+ interface. Our implementation uses \texttt{httpx} for HTTP interactions and adheres to the expected headers and structure as defined by SGP.22:
 \begin{lstlisting}[language=json,caption={ES9+ Request Headers}]
 {
@@ -281,13 +279,19 @@ The \gls{smdpp} client is primarily used by our \gls{isdr} application to execut

 To uncover behavioral differences between \gls{euicc} implementations, we implemented a fuzzing framework that mutates valid \glspl{apdu} generated via our custom \gls{lpa} implementation based on Design 2 in \cref{subsec:design_2}. Unlike the tracing-and-compare approach described earlier, the fuzzing strategy dynamically constructs valid request data and intentionally mutates it prior to transmission, allowing for meaningful analysis of error-handling behavior across cards.

-\subsubsection*{Fuzzing Scenarios and Execution}
+\paragraph{Fuzzing Scenarios and Execution.}

 We perform fuzzing through predefined \emph{scenarios}, which consist of ordered sequences of function calls targeting the \gls{euicc}. Each function within a scenario is executed via our custom \gls{lpa} implementation and serves as a potential mutation point. To ensure a consistent test environment, the scenario runner establishes a fresh PC/SC connection and resets the card into a clean state by invoking the \texttt{eUICCMemoryReset} operation. This includes processing all pending notifications and performing a full memory wipe prior to execution.

-To systematically track the fuzzing process, we developed an \textbf{operation recorder} that tracks every function invocation, the applied mutations, and the corresponding responses. This data is structured as a hierarchical \emph{mutation tree}, where each node represents a function call with a specific mutation applied. Each level in the tree corresponds to a function in the scenario, while sibling nodes denote alternative mutations of the same function. 
+To systematically track the fuzzing process, we developed an \textbf{operation recorder} that tracks every function invocation, the applied mutations, and the corresponding responses. This data is structured as a hierarchical \emph{mutation tree}, where each node represents a function call with a specific mutation applied. Each level in the tree corresponds to a function in the scenario, while sibling nodes denote alternative mutations of the same function. \cref{img:class_basic} shows how the \textbf{operation recorder} intregrates into \sysname.

-\subsubsection*{Mutation Engine}
+\begin{figure}[t]
+    \includesvg[width=\textwidth,inkscapelatex=false]{Graphics/reSIMualte_class_basic}
+    \caption{Simpplified class Diagram of the core classes.}
+    \label{img:class_basic}
+\end{figure}
+
+\paragraph{Mutation Engine.}
 \label{subsubsec:mutation_engine}

 We designed the mutation engine to support both \textit{deterministic} and \textit{random} mutation modes. It implements the following strategies for data transformation:
@@ -307,7 +311,7 @@ We designed the mutation engine to support both \textit{deterministic} and \text
 Deterministic mode ensures reproducibility by applying mutations at fixed, formula-derived offsets, whereas the random mode selects mutation targets probabilistically at runtime. Both modes behave similar to the deterministic and non-deterministic mutation modes used in AFLPlusPlus~\cite{fioraldi_afl_2020}.


-\subsubsection*{Fuzzing Workflow}
+\paragraph{Fuzzing Workflow.}

 Figure \cref{fig:scenario_flow} illustrates the \gls{apdu} fuzzing workflow, which we structured into four main steps:

@@ -335,7 +339,7 @@ We repeat this process for all functions defined in the scenario, producing a co
    \label{fig:tree_structure}
 \end{figure}

-\subsubsection*{Determine Next Mutation Logic}
+\subsubsection*{Determine Next Mutation Logic.}
 % shown in figure4 (flow graph on how to determine next mutation)
 % goals we want to try all mutations for each node
 % handled by operation recorder and next mutation is requeststed by pcsc link
@@ -378,15 +382,15 @@ This strategy is both exhaustive and progress-aware. It ensures that:
  \item The fuzzing process remains deterministic and resumable due to the structured tree format.
 \end{itemize}

-\subsubsection*{Error Handling and Retry Logic}
+\paragraph{Error Handling and Retry Logic.}

 Errors during execution are logged and associated with the current mutation node. If a function fails (e.g., due to protocol state loss or card reset), the runner resets the PC/SC link and the card, then resumes execution. This ensures that failures do not corrupt the mutation tree and allows exploration to continue.

-\subsubsection*{Scenario Persistence and Reuse}
+\paragraph{Scenario Persistence and Reuse.}

 To preserve fuzzing results, the entire mutation tree is serialized and stored using Python's \texttt{pickle} module in a \texttt{.resim} file. This enables post-analysis, comparison across card models, and reproducibility for future \gls{euicc} versions.

-\subsubsection*{Differential Testing}
+\paragraph{Differential Testing.}

 After multiple cards are fuzzed with the same scenario, their corresponding mutation trees are compared to identify behavioral discrepancies. This is done via depth-first traversal of the trees:

@@ -468,11 +472,16 @@ This differential testing method highlights edge-case inconsistencies across \gl
 % on the other hand an undefined error is still handled be the euicc but could not be properly handled -> could mean that there is a potential bug in the implementation and we need to do some further investigation into to this particular function call
 % -> euicc exceptions are ignored unless they are an UndefinedError

+% for each failer hypothesis automatically saves test fails to local file
+% saves input and prints hash to identify the failed test input
+% failed runs are automatically tested again on furture runs before generating new test cases
+% this allows us to test failed input against other cards when running the fuzzing against them -> differential testing
+
 While APDU-level fuzzing (see \cref{subsec:apdu_fuzzing}) is useful for evaluating command behavior across different \textit{euicc} implementations, it suffers from the drawback that random mutations, particularly at the bit or byte level, often invalidate the structured \gls{asn1} encoding. As a result, many \gls{apdu} mutations are immediately rejected as malformed, limiting the coverage and effectiveness of the test campaign.

 To address this limitation, we introduce a complementary \textit{data fuzzing} approach based on Design 3 in \cref{subsec:design_3}, that operates at the semantic level by fuzzing the input arguments of high-level \gls{lpa} function calls. This enables us to maintain structural validity while still exercising a wide variety of edge cases in the data provided to the \gls{euicc}. Our implementation builds on property-based testing frameworks designed for Python, in particular the \texttt{hypothesis} library~\cite{maciver_hypothesis_2019}.

-\paragraph{Fuzzing with Hypothesis}
+\paragraph{Fuzzing with Hypothesis.}
 Hypothesis is a property-based testing framework, which allows developers to define \textit{strategies} for input data. The framework then generates test cases based on these strategies and attempts to explore edge cases through randomized sampling and shrinking. Unlike traditional random fuzzing, Hypothesis ensures that generated inputs conform to the structural invariants defined by the strategy, thereby increasing the likelihood of discovering subtle logic errors in protocol handling.

 Hypothesis integrates seamlessly with \texttt{pytest} and uses the \texttt{@given} decorator to specify input generation strategies. For example, given the \gls{asn1} structure defined in the SGP.22 specification for the \texttt{Get\-Profile\-Info} function:
@@ -513,7 +522,7 @@ def test_get_profiles(self, use_iccid, profile_class, tags):

 This approach preserves the semantics and structure of the expected \gls{asn1} types while still allowing a wide variety of edge cases to be exercised.

-\paragraph{Implementation Scope}
+\paragraph{Implementation Scope.}
 Due to reliance on external infrastructure for the \gls{rsp} process, such as the \gls{smdpp} server, our fuzzing campaign focuses exclusively on the \gls{euicc}-side of the \gls{rsp} protocol. Invalid structured fuzzing requests directed at the \gls{smdpp} would lead to excessive traffic and could be misinterpreted as \gls{dos} attempts. Therefore, we restrict our tests to those functions defined in the ES10a, ES10b, and ES10c interfaces of the SGP.22 specification, which form the communication layer between the \gls{lpa} and the \gls{euicc}, specifically focusing on functions that accept structured input arguments and directly interact with the \gls{euicc}.


@@ -542,19 +551,21 @@ Specifically, we implemented fuzzing tests for the following functions:
    \end{itemize}
 \end{itemize}

-\paragraph{Fuzzing Lifecycle}
+\paragraph{Fuzzing Lifecycle.}
 During the \texttt{setUpClass} phase, a PC/SC link is initialized, and the \gls{euicc} is prepared (\eg, by installing a test profile) to ensure the preconditions for each function are met. After executing the class's test suite, the \texttt{eUICCMemoryReset} function is called with all reset options enabled to restore a clean state. All leftover notifications are processed to leave the card in a consistent state for subsequent tests.

-\paragraph{Error Classification}
+\paragraph{Error Classification.}
 According to the SGP.22 specification, many functions may return a generic \texttt{UndefinedError} in response to unexpected or malformed input. In our implementation, exceptions raised by the \gls{euicc} that map to well-defined error codes (i.e., subclasses of \texttt{EuiccException}) are not treated as test failures. These represent handled errors indicating that the input was invalid but the card responded appropriately.

 By contrast, when an \texttt{UndefinedError} is returned, we treat this as a potential indicator of an unhandled internal error or inconsistent implementation behavior. These cases are flagged for further investigation. Additionally, exceptions occurring outside the \gls{euicc}, such as Python \texttt{AssertionError}s or test harness failures, are treated as bugs in the testing infrastructure and are logged separately.

 \todo{Explain how we use differential testing in this context}

-\paragraph{Conclusion}
+\paragraph{Conclusion.}
 By combining property-based data generation with structural knowledge of \gls{asn1} types, we extend the fuzzing coverage of the \gls{euicc} interface beyond what is possible with \gls{apdu} mutation alone. This enables the discovery of semantic inconsistencies and unhandled corner cases in \gls{euicc} implementations, especially when compared across different vendors during differential testing as shown in \cref{sec:data_fuzzing_evaluation}.

+\textit{hypothesis} automatically records any failing test cases to local storage. For each failure, the corresponding input is saved and a unique hash is printed to allow reproducible identification of the triggering input. These previously failing test cases are automatically re-executed during future fuzzing runs prior to generating new test data. This mechanism enables us to efficiently validate whether the same input leads to diverging behavior across different \glspl{euicc}, thereby supporting systematic and automated differential testing.
+

 \section{CLI}
 \label{sec:cli}
@@ -587,8 +598,8 @@ The \gls{cli} is built using Python’s standard \texttt{argparse} module for ar

 The CLI structure is further detailed in \cref{sec:cli_structure}.

-\paragraph{Integration with Pytest}
+\paragraph{Integration with Pytest.}
 The data fuzzing component internally wraps \texttt{pytest}, leveraging the structure of Python test classes defined with the Hypothesis framework (cf. Section~\ref{subsec:data_fuzzing}). Each test class corresponds to a group of \gls{rsp} commands. By invoking the data fuzzing \gls{cli}, all available test classes are executed against the connected \gls{euicc}, with proper initialization and teardown logic handled automatically.

-\paragraph{Extensibility}
+\paragraph{Extensibility.}
 The \gls{cli} is designed with extensibility as a primary concern. Adding new commands requires minimal effort: developers only need to create a new subfolder, define a \texttt{run()} function, and register the new command in the main \gls{cli} dispatcher. Moreover, the \gls{cli} is completely decoupled from the core library logic, ensuring that library users are not forced to depend on the \gls{cli} subsystem and vice versa.