Update on Overleaf.

2026-02-04 11:07:43 +00:00 · 2025-07-14 12:48:46 +00:00
parent f9130f1aee
commit 19fc98c9ac
4 changed files with 175 additions and 62 deletions
--- a/Chapters/Implementation.tex
+++ b/Chapters/Implementation.tex
@@ -27,55 +27,7 @@ The primary goal of this thesis is to conduct a security analysis of commercial

 To perform differential testing, we designed a structured fuzzing methodology that employs both valid and mutated \gls{apdu} sequences. By observing and comparing how multiple \glspl{euicc} respond to the same inputs, we aim to uncover deviations that may indicate security flaws or implementation weaknesses.

-\section{Design}
-
-This section presents the step-by-step refinement of the testing strategy. The initial approach relied on recording and replaying \gls{apdu} traces as a basic method for interacting with \glspl{esim}. As the design progressed, it incorporated communication via the \gls{lpa}, and was eventually extended to include fuzzing capabilities at \gls{apdu} level. This evolution enabled more comprehensive and fine-grained differential testing across multiple \gls{euicc} implementations. 
-
-\paragraph{Initial Naive Approach}
-
-We first implemented a simple observation setup using the \texttt{simtrace2} tool. \texttt{simtrace2}~\cite{osmocom_simtrace_nodate} allows monitoring of communication between a physical device (typically a smartphone acting as the \gls{lpa}) and a \gls{sim} card. The tool captures \glspl{apdu} and forwards them via \gls{udp} packets to a local socket. From there, we parsed and analyzed the \gls{apdu} data.
-
-Our proposed methodology involved the following steps:
-\begin{enumerate}
-    \item Record the \gls{apdu} traffic between the \gls{lpa} and the \gls{euicc} during an \gls{rsp} session.
-    \item Store this traffic in a structured format.
-    \item Replace the original \gls{euicc} with another one inserted into a PC/SC-compatible card reader.
-    \item Replay each recorded \gls{apdu} and monitor the response.
-\end{enumerate}
-
-The goal was to detect behavioral differences, such as differing \glspl{sw} or execution failures. However, we discovered that this method was impractical in real-world scenarios. Due to the nature of the \gls{rsp} protocol, many operations involve cryptographic bindings using session-specific nonces, rendering traffic replay infeasible.
-
-\paragraph{Controlled LPA Implementation}
-
-To address the limitations of passive traffic replay, we developed our own minimal and controllable \gls{lpa}. Instead of relying on proprietary \gls{lpa} applications supplied by \gls{esim} vendors, we opted to implement a custom solution for two key reasons:
-
-\begin{itemize}
-    \item Vendor \glspl{lpa} often introduce extraneous or undocumented traffic unrelated to the provisioning process, which complicates analysis.
-    \item A custom \gls{lpa} allows for controlled mutation and injection of \gls{apdu} sequences.
-\end{itemize}
-
-The implemented \gls{lpa} performs a target operation (e.g., profile download or enablement) by issuing the appropriate command sequence to the \gls{euicc} in the PC/SC card reader. Prior to transmission, we programmatically mutate \glspl{apdu} to test the implementation’s robustness against malformed or unexpected input. We then record the resulting status words and assess behavioral consistency across different \gls{euicc} devices.
-
-While our approach allows for a more precise control, it has some drawbacks. \gls{rsp} is a stateful protocol, and provisioning actions rely on interaction with the profile vendor's \gls{smdpp} server. Consequently, execution speed is constrained by network latency and backend responsiveness as well as restoring the \gls{euicc} state after a reset.
-
-\paragraph{Fuzzing Strategy}
-
-When applying mutations to \gls{apdu} messages, we encountered a common issue: random mutations frequently produce invalid \gls{asn1} structures. This narrows the testing focus to the \gls{asn1} decoder, which represents only a small part of the total \gls{euicc} logic. Still, fuzzing at the decoding layer can still yield valuable results, as parsing flaws in \gls{asn1}-based decoders have historically led to critical vulnerabilities~\cite{mitre_cve_2003, nist_nvd_2024, nist_nvd_2025}.
-
-To improve the depth and scope of our fuzzing efforts, we adapted our implementation to generate and mutate structurally valid input instead. By preserving the syntactic and semantic correctness of \gls{asn1} structures, we enabled the fuzzer to exercise deeper layers of application logic. This allowed us to test state transitions, logical constraints, and error handling mechanisms that would otherwise remain untriggered by malformed data.
-
-To support this structured fuzzing approach, we integrated the Python-based \texttt{hypothesis} library, which provides property-based testing capabilities~\cite{maciver_hypothesis_2019}. Using \texttt{hypothesis}, we defined input schemas mirroring the \gls{asn1} structures employed in the SGP.22 specification~\cite{gsma_sgp22_2025}. The framework then automatically generates valid input covering a wide range of edge cases.
-
-With this setup, we were able to test:
-\begin{itemize}
-    \item Field boundary conditions (e.g., maximum tag lengths).
-    \item Rare but valid combinations of optional elements.
-    \item Complex nesting of \gls{tlv} structures.
-\end{itemize}
-
-In the following sections, we present the technical implementation details of our \gls{lpa} logic, input mutation framework, and fuzzing harness.
-
-\section{Tracing}
+\section{Tracing and Replay}
 \label{sec:tracing}

 % functions: 
@@ -89,7 +41,7 @@ In the following sections, we present the technical implementation details of ou
 % - recording: represents a list of recorded \glspl{apdu}, handles source and target isd-r addresses, file saving and loding as well as checking if the file is replayable
 % - replay: establishes connection to pcsc via pcsc link, loads recorded \glspl{apdu} and sends them over the link to the connected euicc, switches out source isd-r and target isd-r during replay, compares response status word to recorded status word on prints an error if there is a difference

-We built the tracing component to capture and interpret \glspl{apdu} exchanged between an \gls{lpa} (or other source) and the \gls{euicc}, and to replay them by inserting the recorded \glspl{apdu} into the communication between the \gls{lpa} and the \gls{euicc}. This forms the foundation of the differential testing framework by allowing the same interaction sequence to be executed across multiple \glspl{euicc} for behavioral comparison.
+We built the tracing component based on Design 1 in \cref{subsec:design_1} to capture and interpret \glspl{apdu} exchanged between an \gls{lpa} (or other source) and the \gls{euicc}, and to replay them by inserting the recorded \glspl{apdu} into the communication between the \gls{lpa} and the \gls{euicc}. This forms the foundation of the differential testing framework by allowing the same interaction sequence to be executed across multiple \glspl{euicc} for behavioral comparison.

 Our tracing functionality comprises two main operations:

@@ -112,15 +64,15 @@ The implementation consists of several key components:
 \begin{description}
    \item[\texttt{PcscLink}] A thin wrapper over the Python \texttt{pyscard} library~\cite{rousseau_pyscard_2025}, which abstracts away low-level communication with PC/SC-compatible card readers. It handles session establishment, \glspl{apdu}/\gls{tpdu} transmission, and automatic processing of status words such as \texttt{61XX} (i.e., triggering \texttt{GET RESPONSE} when necessary).

-    \item[\texttt{Card}] Represents a connected card in a PC/SC reader. It queries the card to determine its type (e.g., standard \gls{sim}, test \gls{euicc}, or commercial \gls{euicc}), and identifies installed applications such as \texttt{\gls{isdr}} or \texttt{\gls{ecasd}}. The class serves as the interface for sending \glspl{apdu} to the card through the \texttt{pcsc\_link}.
+    \item[\texttt{Card}] Represents a connected card in a PC/SC reader. It queries the card to determine its type (e.g., standard \gls{sim}, test \gls{euicc}, or commercial \gls{euicc}), and identifies installed applications such as \gls{isdr} or \gls{ecasd}. The class serves as the interface for sending \glspl{apdu} to the card through the \texttt{pcsc\_link}.

    \item[\texttt{Tracer}] A dummy implementation of the \texttt{Card} interface used during passive tracing. It parses incoming \glspl{apdu} from the GSMTAP interface using \texttt{pysim} and attempts to classify them based on instruction type. This allows mapping observed \glspl{apdu} to functional operations.

-    \item[\texttt{Recorder}] Coordinates tracing and recording. It spawns a separate tracer thread that listens for \glspl{apdu} from GSMTAP in a loop until a timeout occurs or a stop signal is issued. \glspl{apdu} are recorded alongside the designated target \texttt{\gls{isdr}} for later analysis.
+    \item[\texttt{Recorder}] Coordinates tracing and recording. It spawns a separate tracer thread that listens for \glspl{apdu} from GSMTAP in a loop until a timeout occurs or a stop signal is issued. \glspl{apdu} are recorded alongside the designated target \gls{isdr} for later analysis.

    \item[\texttt{recording}] An abstraction for a recorded session. It stores the list of \glspl{apdu}, associated source and target \texttt{\gls{isdr}} addresses, and metadata. It provides serialization functions for saving to and loading from disk, as well as validity checks to determine whether a recording is replayable.

-    \item[\texttt{replay}] Loads a saved \texttt{recording}, connects to the target \gls{euicc} via \texttt{PcscLink}, and replays each \glspl{apdu}. During replay, the source and target \texttt{\gls{isdr}} values are automatically substituted. The response status words from the target \gls{euicc} are compared against those from the original trace. Any mismatch is reported to highlight divergent behavior.
+    \item[\texttt{replay}] Loads a saved \texttt{recording}, connects to the target \gls{euicc} via \texttt{PcscLink}, and replays each \glspl{apdu}. During replay, the source and target \gls{isdr} values are automatically substituted. The response status words from the target \gls{euicc} are compared against those from the original trace. Any mismatch is reported to highlight divergent behavior.
 \end{description}

 This modular structure allows for easy integration into both automated test pipelines and manual inspection tools, and lays the groundwork for both our mutation-based and structure-aware fuzzing techniques described in subsequent sections.
@@ -327,7 +279,7 @@ The \gls{smdpp} client is primarily used by our \gls{isdr} application to execut
 % differences in the tree strcuture are also handled i.e failed mutations and therefor no child nodes
 % - nodes are considered different if the response code is different or it has a different failure reason i.e EuiccException or AssertionError (Problems occurd outside of euicc)

-To uncover behavioral differences between \gls{euicc} implementations, we implemented a fuzzing framework that mutates valid \glspl{apdu} generated via our custom \gls{lpa} implementation. Unlike the tracing-and-compare approach described earlier, the fuzzing strategy dynamically constructs valid request data and intentionally mutates it prior to transmission, allowing for meaningful analysis of error-handling behavior across cards.
+To uncover behavioral differences between \gls{euicc} implementations, we implemented a fuzzing framework that mutates valid \glspl{apdu} generated via our custom \gls{lpa} implementation based on Design 2 in \cref{subsec:design_2}. Unlike the tracing-and-compare approach described earlier, the fuzzing strategy dynamically constructs valid request data and intentionally mutates it prior to transmission, allowing for meaningful analysis of error-handling behavior across cards.

 \subsubsection*{Fuzzing Scenarios and Execution}

@@ -516,9 +468,9 @@ This differential testing method highlights edge-case inconsistencies across \gl
 % on the other hand an undefined error is still handled be the euicc but could not be properly handled -> could mean that there is a potential bug in the implementation and we need to do some further investigation into to this particular function call
 % -> euicc exceptions are ignored unless they are an UndefinedError

-While APDU-level fuzzing (see \cref{subsec:apdu_fuzzing}) is useful for evaluating command behavior across different \textit{euicc} implementations, it suffers from the drawback that random mutations—particularly at the bit or byte level—often invalidate the structured \gls{asn1} encoding. As a result, many \gls{apdu} mutations are immediately rejected as malformed, limiting the coverage and effectiveness of the test campaign.
+While APDU-level fuzzing (see \cref{subsec:apdu_fuzzing}) is useful for evaluating command behavior across different \textit{euicc} implementations, it suffers from the drawback that random mutations, particularly at the bit or byte level, often invalidate the structured \gls{asn1} encoding. As a result, many \gls{apdu} mutations are immediately rejected as malformed, limiting the coverage and effectiveness of the test campaign.

-To address this limitation, we introduce a complementary \textit{data fuzzing} approach that operates at the semantic level by fuzzing the input arguments of high-level \gls{lpa} function calls. This enables us to maintain structural validity while still exercising a wide variety of edge cases in the data provided to the \gls{euicc}. Our implementation builds on property-based testing frameworks designed for Python, in particular the \texttt{hypothesis} library~\cite{maciver_hypothesis_2019}.
+To address this limitation, we introduce a complementary \textit{data fuzzing} approach based on Design 3 in \cref{subsec:design_3}, that operates at the semantic level by fuzzing the input arguments of high-level \gls{lpa} function calls. This enables us to maintain structural validity while still exercising a wide variety of edge cases in the data provided to the \gls{euicc}. Our implementation builds on property-based testing frameworks designed for Python, in particular the \texttt{hypothesis} library~\cite{maciver_hypothesis_2019}.

 \paragraph{Fuzzing with Hypothesis}
 Hypothesis is a property-based testing framework, which allows developers to define \textit{strategies} for input data. The framework then generates test cases based on these strategies and attempts to explore edge cases through randomized sampling and shrinking. Unlike traditional random fuzzing, Hypothesis ensures that generated inputs conform to the structural invariants defined by the strategy, thereby increasing the likelihood of discovering subtle logic errors in protocol handling.