master_thesis/Chapters/Implementation.tex

% !TeX root = ../Thesis.tex

%************************************************
\chapter{Implementation}\label{ch:implementation}
%************************************************
\glsresetall % Resets all acronyms to not used

% - Goal of this thesis is security analysis using differential testing
% - first idea (naive implementation): use simtrace2 to capture traffic between the LPA (ue) and euicc
% - simtrace2 sends \glspl{apdu} to socket via udp packet -> read data from socket -> analyse apdu command for instruction type
% - save recored traffic to file
% - insert other euicc into pcsc card reader -> replay each apdu to euicc
% - check for differences in the responses
% - problem: rsp uses signed nonces -> can't replay data
% - next idea: implement lpa to perform actions via code -> not rely on manual interaction with esim manufacturer lpa app, manufacturer lpa introduce traffic that is not necessary for the intended action
% - use the lpa to produce traffic for the euicc in the pcsc card reader, but mutate it before sending
% - record the returned status codes and check if different euicc behaves the same (crashes at the same point or returns the same status word)
% - on the slower side -> rsp is stateful and we rely on the sm-dp+ from the profile vendor
% - small problem with apdu mutation: we basically just fuzz the asn1 parser of the euicc sometimes
% - alternative: fuzz valid input data
% - oss-fuzz proposes python hypothesis as a framework for fuzzing via python
% - python hypothesis: property based testing library -> we define input structure and hypothesis produces data that is valid for the given structure
% - tests for edge cases
% - in the following sections i will go into details on how each implementation work

The primary goal of this thesis is to conduct a security analysis of commercial \gls{esim} implementations using differential testing. The underlying idea of this approach is to systematically compare the behavior of different \gls{euicc} implementations under the same inputs to detect inconsistencies or vulnerabilities.

\paragraph{Initial Naive Approach}

The first implementation was based on a straightforward observation setup using the \texttt{simtrace2} tool. \texttt{simtrace2}~\cite{osmocom_simtrace_nodate} allows monitoring of communication between a physical device (typically a smartphone acting as the \gls{lpa}) and a \gls{sim} card. The tool captures \glspl{apdu} and forwards them via \gls{udp} packets to a local socket. From this socket, the \gls{apdu} data can be read, parsed, and analyzed.

The proposed method was to:
\begin{enumerate}
    \item Record the \gls{apdu} traffic between the \gls{lpa} and the \gls{euicc} during an \gls{rsp} session.
    \item Store this traffic in a structured format.
    \item Replace the original \gls{euicc} with another one inserted into a \gls{pcsc}-compatible card reader.
    \item Replay each recorded \gls{apdu} and monitor the response.
\end{enumerate}

The goal was to detect behavioral differences, such as differing \glspl{sw} or execution failures. However, this method proved infeasible in practice due to the nature of the \gls{rsp} protocol: many operations are cryptographically bound to the specific session using signed nonces, meaning that replaying recorded traffic is not possible.

\paragraph{Controlled LPA Implementation}

 To overcome the limitations of passive traffic replay, a new strategy was developed. Rather than relying on the proprietary \gls{lpa} applications often provided by \gls{esim} vendors, we implemented our own minimal \gls{lpa}. The motivation behind this was twofold:

\begin{itemize}
    \item Vendor \glspl{lpa} often introduce extraneous or undocumented traffic unrelated to the provisioning process, which complicates analysis.
    \item A custom \gls{lpa} allows for controlled mutation and injection of \gls{apdu} sequences.
\end{itemize}

The implemented \gls{lpa} performs a target operation (e.g., profile download or enablement) by issuing the appropriate command sequence to the \gls{euicc} in the \gls{pcsc} card reader. Before sending, \glspl{apdu} can be programmatically mutated to evaluate robustness of the implementation against malformed or unexpected inputs. The \gls{lpa} records returned status words and checks for behavioral consistency across different \glspl{euicc}.

While this approach allows for a more precise control, it has some drawbacks. \gls{rsp} is a stateful protocol, and provisioning actions rely on interaction with the profile vendor's \gls{smdpp} server. Consequently, execution speed is constrained by network latency and backend responsiveness as well as restoring the \gls{euicc} state after a reset.

\paragraph{Fuzzing Strategy}

A challenge in mutating \gls{apdu} messages is that random mutations often lead to invalid \gls{asn1} structures. This effectively reduces the testing strategy to fuzzing the \gls{asn1} decoder, which is only a small part of the \gls{euicc} logic. To increase test effectiveness, the implementation shifted toward fuzzing \textit{valid structured input} rather than arbitrary byte sequences.

To support structured data fuzzing, this thesis uses the Python-based \texttt{hypothesis} library, which implements property-based testing~\cite{maciver_hypothesis_2019}. \texttt{hypothesis} allows definition of input schemas that mirror \gls{asn1} structures used in \gls{esim} protocols. From these schemas, it automatically generates valid input data covering a wide range of edge cases.

This strategy enables testing of:
\begin{itemize}
    \item Field boundary conditions (e.g., maximum tag lengths).
    \item Rare but valid combinations of optional elements.
    \item Complex nesting of \gls{tlv} structures.
\end{itemize}

In the following sections, the technical details of each implementation component, including the \gls{lpa} logic, mutation framework, and fuzzing harness, are presented.


\section{Tracing}
\label{sec:tracing}

% functions:
% - trace traffic from the simtrace2, map the traffic to function calls i.e. identify which function the call handles, record the traced traffic
% - replay: replay the previously recorded traffic to euicc in pcsc reader, check for differences in responses
% parts:
% - pcsc_link: wrapper for the python smartcard library, handles session establishment to reader, and apdu/tpdu transmission, automatically handles requesting of available data i.e. status word 61XX
% - card: represents card in the pcsc card reader, identifies card type (i.e sgp22, sgp.22 test, normal sim, etc) and which applications are installed (ISDR, ECASD, etc), used to send \glspl{apdu} to pcsc card through pcsc link
% - tracer: dummy implementation of card for instruction interpretation and apdu parsing, uses pysim gsmtap as apdu source
% - recorder: handles tracer thread and recording of \glspl{apdu}, starts tracer main thread (continously listens for new \glspl{apdu} from gsmtap until timeout is reached or canceld by user) and records apdu to recording, has target isd-r as argument
% - recording: represents a list of recorded \glspl{apdu}, handles source and target isd-r addresses, file saving and loding as well as checking if the file is replayable
% - replay: establishes connection to pcsc via pcsc link, loads recorded \glspl{apdu} and sends them over the link to the connected euicc, switches out source isd-r and target isd-r during replay, compares response status word to recorded status word on prints an error if there is a difference

The tracing component is responsible for capturing, interpreting, and replaying \glspl{apdu} communication between an \gls{lpa} (or other source) and the \gls{euicc}. This forms the foundation of the differential testing framework by allowing the same interaction sequence to be executed across multiple \glspl{euicc} for behavioral comparison.

The tracing functionality comprises two main operations:

\begin{itemize}
    \item \textbf{Tracing and recording:} Captures \glspl{apdu} traffic from a physical interface using \texttt{simtrace2}~\cite{osmocom_simtrace_nodate} and associates it with functional interpretations (e.g., profile enablement, deletion). The \glspl{apdu} are parsed and stored along with contextual information such as sender and receiver addresses.
    \item \textbf{Replaying:} Replays previously recorded \glspl{apdu} sequences to an \gls{euicc} in a \gls{pcsc} card reader. It replaces context-specific identifiers and checks for discrepancies in response behavior.
\end{itemize}

\begin{figure}[h!]
    \includesvg[width=\textwidth]{Graphics/trace_setup.svg}
    \caption{Tracing lab setup}
    \label{img:trace_setup}
\end{figure}

The implementation consists of several key components:

\begin{description}
    \item[\texttt{PcscLink}] A thin wrapper over the Python \texttt{pyscard} library~\cite{rousseau_pyscard_2025}, which abstracts away low-level communication with \gls{pcsc}-compatible card readers. It handles session establishment, \glspl{apdu}/\gls{tpdu} transmission, and automatic processing of status words such as \texttt{61XX} (i.e., triggering \texttt{GET RESPONSE} when necessary).

    \item[\texttt{Card}] Represents a connected card in a \gls{pcsc} reader. It queries the card to determine its type (e.g., standard \gls{sim}, test \gls{euicc}, or commercial \gls{euicc}), and identifies installed applications such as \texttt{\gls{isdr}} or \texttt{\gls{ecasd}}. The class serves as the interface for sending \glspl{apdu} to the card through the \texttt{pcsc\_link}.

    \item[\texttt{Tracer}] A dummy implementation of the \texttt{Card} interface used during passive tracing. It parses incoming \glspl{apdu} from the \gls{gsmtap} interface using \texttt{pysim} and attempts to classify them based on instruction type. This allows mapping observed \glspl{apdu} to functional operations.

    \item[\texttt{Recorder}] Coordinates tracing and recording. It spawns a separate tracer thread that listens for \glspl{apdu} from \gls{gsmtap} in a loop until a timeout occurs or a stop signal is issued. \glspl{apdu} are recorded alongside the designated target \texttt{\gls{isdr}} for later analysis.

    \item[\texttt{recording}] An abstraction for a recorded session. It stores the list of \glspl{apdu}, associated source and target \texttt{\gls{isdr}} addresses, and metadata. It provides serialization functions for saving to and loading from disk, as well as validity checks to determine whether a recording is replayable.

    \item[\texttt{replay}] Loads a saved \texttt{recording}, connects to the target \gls{euicc} via \texttt{PcscLink}, and replays each \glspl{apdu}. During replay, the source and target \texttt{\gls{isdr}} values are automatically substituted. The response status words from the target \gls{euicc} are compared against those from the original trace. Any mismatch is reported to highlight divergent behavior.
\end{description}

This modular structure allows for easy integration into both automated test pipelines and manual inspection tools, and lays the groundwork for both mutation-based and structure-aware fuzzing techniques described in subsequent sections.


\section{LPA}
\label{sec:lpa}

% due to the limitations of the tracer to replay rsp correctly -> need for lpa to execute interaction with euicc with valid input
% lpa handles communication over different interfaces as defined in sgp22
% we are using sgp v3.1 -> newest version at the date of writing this thesis
% lpa implementation consists of different parts

% card
% represents euicc that is currently inserted into the pcsc card reader
% once created: starts scanning for supported applications on the card
% checks which application responds for which class, instruction code, and adf
% adf is important: esim on sim applications of deviate from the common adf that is proposed in the sgp22, application implementations contain multiple known adfs -> card selects the one that is used by the euicc
% handles application selection and keeps track of the currently selected application to prevent reselection or unnecessary traffic

% pcsc link
% uses the pySim LinkBaseTpdu class as base
% once initialised it uses given pcsc card reader to establish an exclusive connection to the card reader
% esclusive connection: euicc have a state -> we would loose state if other cards could perform file sections etc in between -> no shared connection with other programs
% during the connection process a few steps happen:
% - check which protocol is supported T=0 or T=1
% - establish connection via given protocol and check for errors
% link can be established via a python context manager -> automatically close connection once context is exited
% handles apdu transmission and tpsu transmission
% apdu transmission also handles some return codes
% - 9FXX, 61XX, 62XX, 63XX: automatically request availble response bytes -> reponse bytes are autmatically attached to orignal r-apdu -> to the caller it appears as one apdu even though in the background multiple \glspl{apdu} were send
% before sending \glspl{apdu} it may also perform mutation by calling the optional mutation engine (ref mutation engine section in  apdu fuzzing)
% as well as records the \glspl{apdu} (ref apdu fuzzing section)

% application
% represents euicc application like isd-r, ecasd etc and implements application specific functionality, also handles apdu communication with pcsc link for application related traffic i.e store_data command
% has main/common adf address and multiple aliases which are used by different vendors for esim implementations
% ADFs for eSIM on SIM applications that we had access to
% using simtrace2 and vendor lpa traffic to find out which adf was used for the isd-r
% Common isd-r adf: A0000005591010FFFFFFFF8900000100
% 5Ber.esim: A0000005591010FFFFFFFF8900050500
% Xesim: A0000005591010FFFFFFFF8900000177
% esim.me: A0000005591010000000008900000300
% current implementations consist of isd-r/isd-p, estk_fwupd

% applications communicate to card via pcsc link
% application functions trigger store_data command -> internally uses asn1tools to encode and decode data
% once data is decoded: application functions use response data specific data classes to parse and validate data
% these data classes use pydantic for serialization and deserialization as well as decoding and encoding of data -> easier to handle base64 encoded data which often is returned by the smdp+ as well as decode special data such as bitstrings, hex strings as well as version types
% implemented using custom decoders and encoders -> makes it easier to read data
% bit strings: chain of bits where each bit represents a function or piece of data that is given or not given -> pydantic serializer mixin makes it easier to represent this information in code and also makes it easier to use this information (i.e when used as library) to check whether a bit is set or not
% order: data dict -> store_data -> use asn1tools to encode -> build apdu -> send apdu -> decode return data -> parse and decode with data class

%isd-r implementation handles all rsp related functions
% estk_fwupd: implements the propriatary estk update mechanism
% this was reverse engineered and is further explained in the findings section (ref findings section)
% can return currently installed firmware version, unlock the euicc to accept and new firmare,  install new binary
% footnote: unlocking the euicc differs from the card unlocking functionality (cite global platform specs which define unlocking) which allows the use of gp commands to for example install new java card applets

%note: maybe explain store_data command in more detail i.e apdu splitting -> indicate that more data follows

% exception handling
% sgp22 defines a possible errors for interactions
% euicc returns error code -> user has to know which kind of error was triggered
% for us its rather important to exactly know which errors were triggered not only to know which of those errors are expected but also to know what went wrong
% exception handling triggers exact exception based on error code (insert code listing which defines errors and raises them based on the exception)

% smdp client
% lpa not only handles communication to euicc but also to the smdp+ server -> client side implementation is necessary -> defined as es9+ interface in sgp22
% uses httpx as base for http communication
% sgp22 defines that the header should indicate the supported rsp version
% { "Content-Type": "application/json", "User-Agent": "gsma-rsp-lpad", "X-Admin-Protocol": "gsma/rsp/v3.1.0" }
% server only accepts json data in body (cite sgp22 definition) -> each value of the key/value pair is base64 encoded
% pydantic is used for deserialization of response data
% before returning the data to the caller -> client checks for error on server and eventually raises the corresponding exception -> as explained in the exception handling part
% smdp+ client is mostly used by the isd-r

Due to the limitations of the \texttt{tracer} implementation in correctly replaying \gls{rsp} interactions, a dedicated \gls{lpa} implementation was developed to initiate valid interactions with the \gls{euicc}. This enables the controlled generation and mutation of valid traffic which we will further explain in \cref{sec:fuzzing}. Our implementation targets the \gls{sgp22} v3.1 specification, which was the latest version available at the time of writing.

The \gls{lpa} is composed of multiple components:

\paragraph{Card}
Represents the \gls{euicc} currently inserted into the \gls{pcsc} card reader. Upon initialization, it scans the card for supported applications, identifying the applicable \gls{adf} through probing. This is necessary as eSIM-on-SIM implementations often use proprietary \glspl{adf}, diverging from the \glspl{adf} specified in the \gls{sgp22} standard. The card object keeps track of the selected application to reduce unnecessary reselection and traffic.

\paragraph{PC/SC Link}
This component is based on \texttt{pySim}'s \texttt{LinkBaseTpdu}. It establishes an exclusive connection to the \gls{pcsc} reader to maintain session state consistency, which is required due to the stateful nature of \gls{euicc} interactions. During initialization:
\begin{itemize}
  \item The supported transmission protocol (T=0 or T=1) is detected.
  \item A connection is established and validated.
\end{itemize}
It handles both \gls{apdu} and \gls{tpdu} transmission, automatically requesting additional data when status words such as \texttt{9FXX}, \texttt{61XX}, \texttt{62XX}, or \texttt{63XX} are encountered. When enabled, it invokes an optional mutation engine before sending \glspl{apdu} (see \cref{subsec:apdu_fuzzing}) and also records all traffic for later analysis.

\paragraph{Application}
Each euicc application (e.g., \gls{isdr}, \gls{ecasd}, ESTK firmware update) is implemented with application-specific logic and communicates with the card via the \texttt{pcsc\_link}. The application layer abstracts encoding/decoding and command sending. For instance, the \texttt{store\_data} command is handled internally using \texttt{asn1tools} for encoding and decoding.

Known \glspl{adf} for \gls{isdr} observed during analysis:
\begin{itemize}
  \item Common: \texttt{A0000005591010FFFFFFFF8900000100}
  \item 5Ber.esim: \texttt{A0000005591010FFFFFFFF8900050500}
  \item Xesim: \texttt{A0000005591010FFFFFFFF8900000177}
  \item esim.me: \texttt{A0000005591010000000008900000300}
\end{itemize}as

The decoded response data is further processed using \texttt{pydantic} data classes. These enable structured parsing of values including Base64-encoded strings, bitfields, version types, and more. Custom encoders/decoders are used to simplify readability and downstream data processing. For bit fields, a mixin is used to allow checking for specific feature flags via simple accessors.

The \texttt{estk\_fwupd} application implements a proprietary firmware update interface, which was reverse-engineered (see \cref{sec:findings}). It supports reading the current firmware version, unlocking\footnote{This unlocking is distinct from \gls{gp}-defined unlocking, which allows the execution of generic \gls{gp} commands. See \gls{gp} Card Specification.} the \gls{euicc} for updates, and installing new binaries.

\paragraph{Exception Handling}
The \gls{sgp22} standard defines a variety of response codes and error conditions. The \gls{lpa} library maps these response codes to custom exception classes for precise error handling. This is essential for both debugging and for the differential testing framework to reason about diverging behavior across implementations. A code listing of the exception handling mappings is provided in \cref{sec:exception-handling}.

\paragraph{SM-DP+ Client}
In addition to \gls{euicc} communication, the \gls{lpa} must interact with the \gls{smdpp} server via the ES9+ interface. Our implementation uses \texttt{httpx} for HTTP interactions and adheres to the expected headers and structure as defined by \gls{sgp22}:
\begin{lstlisting}[language=json,caption={ES9+ Request Headers}]
{
  "Content-Type": "application/json",
  "User-Agent": "gsma-rsp-lpad",
  "X-Admin-Protocol": "gsma/rsp/v3.1.0"
}
\end{lstlisting}

Payload values are Base64-encoded as required by the specification. Response data is deserialized using \texttt{pydantic}. Error responses from the server trigger the appropriate exception, as explained previously.

The \gls{smdpp} client is primarily used by the \gls{isdr} application to execute \gls{rsp}-related functionality.

\section{Fuzzing}
\label{sec:fuzzing}

\subsection{APDU Fuzzing}
\label{subsec:apdu_fuzzing}

% idea: use custom lpa implementation to construct valid data and mutate it before sending -> check how the responses differentiate between euicc implementations
% idea construct scenarios (pieces of code that make use of the lpa and its functions) and run them against euicc implementations
% in comparison to the tracing and compare approach: the data is generated newly for each execution with valid data i.e no exact replay
% since the generated data is valid we need to mutate it to trigger some errors and see differences -> mutation engine: handles mutation of apdus
% each function that the lpa executes is recorded before and after its mutation including the response codes
% the recordings are structured in a tree where each node repesents a function executed on the euicc with one mutation i.e each level represents one function and each node on that level a mutation (ref to figure that shows mutation tree example)
% fuzzing can be called "coverage guided" since we try every function with multiple mutations until we handled all operations implemented in the scenario -> not contiuing indefinitly and we some sense of progress
% other advantages with the tree based approach: we save each step -> can continue after failure without loosing previouse work; can potentially be easy to parallelize i.e each compute node handles one subtree of the root node etc; easier to visualize and see differences when printed as a tree


% mutation engine
% deterministic and random mutation engine
% both implement the same kind of mutation types: bitflip, random byte, zero block, shuffle block, truncate
% deteministic always modifies the same data to have conistency between function execution for different euiccs
% random engine applies the mutation to randome pieces of data
% bitflip: flips a number of bits -> number is determined based on mutation_rate and length of data $max(1, len(data) * mutation_rate)$
% random byte: switches the positions of bytes in data -> number of switches is determined in the same way as for bitflip
% zero block: set the value of all bits in a block with some length to zero
% shuffle block: data is cut into 16 bit pieces -> pieces are sorted by $sum(block) % 256$ and concatenated
% truncate: removes tail of data

% scenarios
% represents a sequence of functions that are executed on each euicc (insert listing for example scenario)
% is executed by scenario runner
% scenario runner controls scenario execution, handles errors and records data to the operation recorder
% operation recorder handles tree structure and determines which mutation is applied next
% problem: if error occurs we need to restart scenario due to statefulness of euicc and its protocols -> high overhead just for restoring state of euicc
%

% key components are the mutation engine and the operation recorder
% mutation engine: mutate a given apdu
% operation recorder: records mutation and responses, and takes care of the tree structure
% scenario runner executes all scenarios on a given euicc
% for each scenario the scenario runner initates a pcsc link with the card and resets the card (processes all notifications, euicc memory reset with all options set) -> clean and same "base" state for all scenarios
% runs all operations defined in the scenario -> operations invoke euicc commands
% euicc commands are called with the send_apdu_with_mutation function -> handles apdu transmission and mutation aswell as recording of data

% apdu mutation workflow
% ref to figure that shows scenario runner flow
% 1. mutation selection
% operation recorder handles recording and the tree structure
% operation recorder returns next mutation type to choose
% reason: we want to try every mutation for a function but when all are tried check if any of the child nodes have not tried mutations, if child node has not tried mutation: perform mutation so we get to the child node, if child node has tried all mutations: use None mutation node to continue down the tree
% to determine the next mutation: Traverses or expands a mutation tree to decide which mutation to try next. -> ref to figure that shows flow graph
% Each function call (e.g., get_euicc_info_1) becomes a node in the mutation tree.
% If untried mutations exist for the current node: Create and move to a new child node with the selected mutation type.
% If all mutations are tried: Traverse children to find another node to explore.
% 2. apdu mutation
% selected mutation is applied to the original apdu with the mutation engine
% 3. apdu transmission
% mutated apdu is send to card
% if successful: response is recorded and current node marked as success
% if error occured: error is recorded and current node is marked as failed -> no child nodes will be explored
% 4. recording mutation
% response is saved to mutation node

% Doing this for all operations in the scenario: results in a mutation tree (ref to figure that shows mutation tree example)

% error handling and retry logic
% errors and exceptions during the scenario execution are handled
% runner logs failure to the current node
% reset card (process all notifications and perform euicc memory reset with all options set) and mutation engine
% check if the card still has any untried mutations or if its fully explored -> continue with scenario or switch to new one

% saving recording
% recordings are saved for comparison between the cards and also for euicc that might be released in the future
% use python pickle to store the whole mutation tree as a ".resim" file
% afterwards: clear the recorder, reset the link, reset card -> continue with next scenario

% differential testing
% - to find differences in responses -> compare two or more recording files with each other
% - operation recorder compares mutation tree nodes and prints tree with differences
% - traverses the tree with a depth first search and compares nodes -> tree should have the same structe i.e same depth and each node has same numbers of childs
% differences in the tree strcuture are also handled i.e failed mutations and therefor no child nodes
% - nodes are considered different if the response code is different or it has a different failure reason i.e EuiccException or AssertionError (Problems occurd outside of euicc)

To uncover behavioral differences between \gls{euicc} implementations, we implemented a fuzzing framework that mutates valid \glspl{apdu} generated via our custom \gls{lpa} implementation. Unlike the tracing-and-compare approach described earlier, the fuzzing strategy dynamically constructs valid request data and intentionally mutates it prior to transmission, allowing for meaningful analysis of error-handling behavior across cards.

\subsubsection*{Fuzzing Scenarios and Execution}

Fuzzing is conducted through predefined \emph{scenarios}—sequences of function calls that operate on the \gls{euicc}. Each function in a scenario interacts with the \gls{euicc} through the \gls{lpa} and is subject to mutation. The scenario runner initiates a fresh \gls{pcsc} link, resets the card into a clean state (processing all notifications and performing a full memory reset), and executes each function with multiple mutations.

This process is guided by an \textbf{operation recorder} that tracks each function call, applied mutations, and resulting responses in a structured \emph{mutation tree}. Each tree node represents a specific function call executed with one type of mutation. A tree level corresponds to a function in the scenario; sibling nodes represent different mutations of that function.

\subsubsection*{Mutation Engine}
\label{subsubsec:mutation_engine}

The mutation engine supports both \textit{deterministic} and \textit{random} mutation modes, and implements the following strategies:

\begin{itemize}
  \item \textbf{Bit flip:} Flips a fixed number of bits based on mutation rate and payload length.
  \item \textbf{Random byte:} Swaps random byte positions.
  \item \textbf{Zero block:} Replaces a sequence of bytes with zeros.
  \item \textbf{Shuffle block:} Chunks data into 16-bit segments and sorts them by their checksum.
  \item \textbf{Truncate:} Removes the tail of the \gls{apdu}.
\end{itemize}

Deterministic mode ensures reproducibility by always mutating the same offset, while the random mode selects targets dynamically. This allows us to explore both fixed and variable fuzzing behavior. Both modes behave similar to the deterministic and non-deterministic mutation modes used in AFLPlusPlus.

\subsubsection*{Fuzzing Workflow}

The \gls{apdu} fuzzing workflow is illustrated in \cref{fig:scenario_flow} and proceeds in four main steps:

\begin{enumerate}
  \item \textbf{Mutation selection:} The operation recorder decides the next mutation to apply based on a depth-first traversal of the mutation tree. If all mutations for the current function are exhausted, the runner searches for unexplored child nodes.
  \item \textbf{\gls{apdu} mutation:} The selected mutation is applied to the original APDU using the mutation engine.
  \item \textbf{\gls{apdu} transmission:} The mutated \gls{apdu} is sent to the card. Success or failure is recorded in the current node.
  \item \textbf{Recording:} The response (or exception) is saved to the mutation tree node for further analysis.
\end{enumerate}

This process repeats for all functions defined in the scenario, resulting in a complete mutation tree (see \cref{fig:tree_structure}) that captures all inputs, outputs, and error states.

\begin{figure}
	\centering
    \input{Graphics/mutation_tree.tikz}
    \caption{Tree structure of recording.}
    \label{fig:tree_structure}
\end{figure}

\subsubsection*{Determine Next Mutation Logic}
% shown in figure4 (flow graph on how to determine next mutation)
% goals we want to try all mutations for each node
% handled by operation recorder and next mutation is requeststed by pcsc link
% information that we have is the current node
% 1. check if the current node still has some untried mutations i.e if the node has one child for each possible mutations
% if so we create a new node with the not yet tried mutation type and make it the new current mode and return the mutation type that we just applied to this node
% if not: we continue to traverse each child of the current node and check if they have any untried mutations left
% if so return the mutation type of the child that still has not tried mutation types -> brings us on the subtree where a child has not tried mutations -> next time this function is called we return the new not tried mutation type
% if child does not have any not tried mutations: we return the NoneNode of that child i.e the mutation type of the child that was successfully executed and did not make any mutations. idea: continue down the good path to find untried mutations

The decision process for selecting the next mutation to apply is a key component of the fuzzing framework and is handled entirely by the \texttt{OperationRecorder}. Its responsibility is to ensure that all mutations are eventually applied to each function within a scenario while maintaining a consistent and deterministic traversal order across runs.

\begin{figure}
	\centering
    \input{Graphics/determine_next_mutation_flow.tikz}
    \caption{Flow on how to determine the next mutation that should be used.}
    \label{fig:next_mutation_flow}
\end{figure}

The algorithm, illustrated in \cref{fig:next_mutation_flow}, operates based on the current node in the mutation tree. Each node represents a function invocation, and its children represent the same invocation with different mutations. The logic proceeds as follows:

\begin{enumerate}
  \item \textbf{Check for untried mutations at the current node:}
  The recorder checks whether the current node has already created child nodes for every defined mutation type (e.g., bitflip, zero-block, truncate, etc.). If there are untried mutation types, it selects one of them, creates a new child node with that mutation, sets it as the new current node, and returns the selected mutation type.

  \item \textbf{Recursive traversal of child nodes:}
  If all mutation types have already been tried at the current node (i.e., all child mutations are present), the recorder traverses the subtree rooted at each child node. For each child, it checks if there are any untried mutations deeper in the tree.

  \item \textbf{Descent via valid (None) paths:}
  If no untried mutations are found among the children, the recorder follows the \texttt{NoneNode} child—representing the unmutated, successful execution of the function. This path is presumed to lead to deeper parts of the tree where further mutations might be unexplored. In essence, this descent along the ``clean'' path enables the system to reach other branches that may still contain untested mutations.

  \item \textbf{Backtrack or complete:}
  If the entire subtree from the current node has been fully explored (i.e., all mutations at all levels are exhausted), the recorder signals completion by returning a sentinel (e.g., \texttt{None}) to the scenario runner.
\end{enumerate}

This strategy is both exhaustive and progress-aware. It ensures that:
\begin{itemize}
  \item All mutation types are attempted for every scenario function.
  \item Tree traversal avoids redundant work and naturally prioritizes unexplored paths.
  \item The fuzzing process remains deterministic and resumable due to the structured tree format.
\end{itemize}

\subsubsection*{Error Handling and Retry Logic}

Errors during execution are logged and associated with the current mutation node. If a function fails (e.g., due to protocol state loss or card reset), the runner resets the \gls{pcsc} link and the card, then resumes execution. This ensures that failures do not corrupt the mutation tree and allows exploration to continue.

\subsubsection*{Scenario Persistence and Reuse}

To preserve fuzzing results, the entire mutation tree is serialized and stored using Python's \texttt{pickle} module in a \texttt{.resim} file. This enables post-analysis, comparison across card models, and reproducibility for future \gls{euicc} versions.

\subsubsection*{Differential Testing}

After multiple cards are fuzzed with the same scenario, their corresponding mutation trees are compared to identify behavioral discrepancies. This is done via depth-first traversal of the trees:

\begin{itemize}
  \item Trees must have equivalent structure (same function call order and mutation types).
  \item Nodes are flagged as different if their response status word differs or if the failure reason (e.g. \texttt{EuiccException}, \texttt{AssertionError}) is inconsistent.
  \item Discrepancies due to failed mutations (missing child nodes) are also handled gracefully.
\end{itemize}

This differential testing method highlights edge-case inconsistencies across \gls{euicc} vendors and enables systematic validation of the \gls{rsp} protocol compliance.

\begin{figure}
	\centering
    \input{Graphics/record_scenario_flow.tikz}
    \caption{Flow for recording a scenario.}
    \label{fig:scenario_flow}
\end{figure}


\subsection{Data Fuzzing}
\label{subsec:data_fuzzing}

% Problem with apdu fuzzing: often producing not valid ASN1 structured data due to bit flips etc -> need to fuzz input data
% goal leverage lpa to fuzz input data and see how the esim behaves
% oss-fuzz global project with the goal to make open source software more secure and stable by using fuzzing teqniques (ref oss-fuzz)
% oss-fuzz proposes multiple fuzzer for python implementation among them are the atheris fuzzer aswell as the python hypothesis framework
% atheris coverage guided fuzzer to fuzz python code directly -> not our use case
% python hypothesis: propery based testing library for python -> define input structure which needs to be passed into function -> hypothesis generates random input data that adheres to defined structure and tests for edge cases i.e. fuzzing

% Defining the fuzzing strategies
% hypothesis uses strategies to define the input structure
% add decorator `@given` with a strategy as argument
% strategy can be `strategy.integers()` to define that the first argument a1 of the function accepts integers Z
% this would result in the function beeing called with a1 having the values [0, -18588, -672780074, 32616, ...]
% hypothesis does not try every possible input value but resticts itself to edge cases when not adjusting the max_sample size
% when finding failing examples hypthesis may call the function a few more times to determine the most simple possible arguments that causes the functions to fail

% for out tests we only implement fuzzing strategies for the euicc side of the rsp since we rely on external infrastructure for the smdp+ and to not want to fuzz this part of the rsp -> could be seen as a ddos attack or just overhelm their servers with fuzzing traffic
% implemented fuzzing strategies for the following functions (restricted to functions which require arguments)
% ES10a: SetDefaultDpAddress
% ES10b: PrepareDownload, LoadBoundProfilePackage, AuthenticateServer
% ES10c: GetProfileInfo, EnableProfile, DisableProfile, DeleteProfile, eUICCMemoryReset, SetNickname

% Example implementation for the ES10c GetProfileInfo function
% @given(
%         use_iccid=st.booleans(),
%         profile_class=st.one_of(st.integers(min_value=-20, max_value=20), st.none()),
%         tags=st.one_of(
%             st.lists(
%                 st.text(
%                     min_size=2,
%                     max_size=8,
%                     alphabet=string.hexdigits,
%                 ).filter(lambda s: len(s) % 2 == 0),
%                 min_size=1,
%                 max_size=5,
%             ),
%             st.binary(max_size=8),
%         ),
%     )
%     def test_get_profiles(
%         self,
%         use_iccid: bool,
%         profile_class: str | None,
%         tags: list[str] | bytes | None,
%     ):
%         ...
%
% Based on this ASN1 definition from the sg22 specs
% ProfileInfoListRequest ::= [45] SEQUENCE { -- Tag 'BF2D'
%     searchCriteria [0] CHOICE {
%         isdpAid        [APPLICATION 15] OctetTo16, -- AID of the ISD-P, tag '4F'
%         iccid          Iccid, -- ICCID, tag '5A'
%         profileClass   [21] ProfileClass -- Tag '95'
%     } OPTIONAL,
%     tagList        [APPLICATION 28] OCTET STRING OPTIONAL -- tag '5C'
% }

% for executing the fuzzer we use pytest: hypothesis integrates with pytest
% during the setUpClass process i.e before starting a group of fuzzing tests which are bundled into a class, we setup the card by iniating a pcsc link and perform various other steps that are need in order to fuzz the card i.e install a profile in order for the fuzzer to test the enable profile function.
% when finishing a class of fuzzing tests, we reset the euicc with the eUICCMemoryReset function were all reset options are set and process all leftover notifications -> leave the card in a clean state for other fuzzing tests

% SGP22 defines UndefinedError for some functions
% since the implementation is setup to exclude all EuiccExceptions -> if we receive an euiccException most of the time this means that the euicc handled the erorr internally and responseded with an error code
% this is still an error but one that is handled properly and does not mean that we discovered a bug but rather that the input was not valid
% on the other hand an undefined error is still handled be the euicc but could not be properly handled -> could mean that there is a potential bug in the implementation and we need to do some further investigation into to this particular function call
% -> euicc exceptions are ignored unless they are an UndefinedError

While APDU-level fuzzing (see \cref{subsec:apdu_fuzzing}) is useful for evaluating command behavior across different \textit{euicc} implementations, it suffers from the drawback that random mutations—particularly at the bit or byte level—often invalidate the structured ASN.1 encoding. As a result, many \gls{apdu} mutations are immediately rejected as malformed, limiting the coverage and effectiveness of the test campaign.

To address this limitation, we introduce a complementary \textit{data fuzzing} approach that operates at the semantic level by fuzzing the input arguments of high-level \gls{lpa} function calls. This enables us to maintain structural validity while still exercising a wide variety of edge cases in the data provided to the \gls{euicc}. Our implementation builds on property-based testing frameworks designed for Python, in particular the \texttt{hypothesis} library~\cite{maciver_hypothesis_2019}.

\paragraph{Fuzzing with Hypothesis}
Hypothesis is a property-based testing framework, which allows developers to define \textit{strategies} for input data. The framework then generates test cases based on these strategies and attempts to explore edge cases through randomized sampling and shrinking. Unlike traditional random fuzzing, Hypothesis ensures that generated inputs conform to the structural invariants defined by the strategy, thereby increasing the likelihood of discovering subtle logic errors in protocol handling.

Hypothesis integrates seamlessly with \texttt{pytest} and uses the \texttt{@given} decorator to specify input generation strategies. For example, given the \gls{asn1} structure defined in the \gls{sgp22} specification for the \texttt{GetProfileInfo} function:

\begin{lstlisting}[caption={ASN.1 definition of the ProfileInfoListRequest}]
ProfileInfoListRequest ::= [45] SEQUENCE {
    searchCriteria [0] CHOICE {
        isdpAid        [APPLICATION 15] OctetTo16,
        iccid          Iccid,
        profileClass   [21] ProfileClass
    } OPTIONAL,
    tagList        [APPLICATION 28] OCTET STRING OPTIONAL
}
\end{lstlisting}

We define the following Hypothesis test for the Python implementation of \texttt{GetProfileInfo}:

\begin{lstlisting}[language=Python, caption={Hypothesis-based fuzzing of \texttt{GetProfileInfo} function which sends a \texttt{ProfileInfoListRequest}.}]
@given(
    use_iccid=st.booleans(),
    profile_class=st.one_of(st.integers(min_value=-20, max_value=20), st.none()),
    tags=st.one_of(
        st.lists(
            st.text(
                min_size=2,
                max_size=8,
                alphabet=string.hexdigits,
            ).filter(lambda s: len(s) % 2 == 0),
            min_size=1,
            max_size=5,
        ),
        st.binary(max_size=8),
    ),
)
def test_get_profiles(self, use_iccid, profile_class, tags):
    ...
\end{lstlisting}

This approach preserves the semantics and structure of the expected \gls{asn1} types while still allowing a wide variety of edge cases to be exercised.

\paragraph{Implementation Scope}
Due to reliance on external infrastructure, such as the \gls{smdpp} server, our fuzzing campaign focuses exclusively on the \gls{euicc}-side of the \gls{rsp} protocol. Fuzzing requests directed at the \gls{smdpp} may lead to excessive traffic and could be misinterpreted as \gls{dos} attempts. Therefore, our tests are restricted to those functions in the ES10a, ES10b, and ES10c interfaces which accept structured input arguments and interact directly with the \gls{euicc}.

Specifically, we implemented fuzzing tests for the following functions:
\begin{itemize}
    \item \textbf{ES10a:} \texttt{SetDefaultDpAddress}
    \item \textbf{ES10b:} \texttt{PrepareDownload}, \texttt{LoadBoundProfilePackage}, \texttt{AuthenticateServer}
    \item \textbf{ES10c:} \texttt{GetProfileInfo}, \texttt{EnableProfile}, \texttt{DisableProfile}, \texttt{DeleteProfile}, \texttt{eUICCMemoryReset}, \texttt{SetNickname}
\end{itemize}

\paragraph{Fuzzing Lifecycle}
Each fuzzing test class is executed under a shared fixture. During the \texttt{setUpClass} phase, a \gls{pcsc} link is initialized, and the \gls{euicc} is prepared (e.g., by installing a test profile) to ensure the preconditions for each function are met. After executing the class's test suite, the \texttt{eUICCMemoryReset} function is called with all reset options enabled to restore a clean state. All leftover notifications are processed to leave the card in a consistent state for subsequent tests.

\paragraph{Error Classification}
According to the \gls{sgp22} specification, many functions may return a generic \texttt{UndefinedError} in response to unexpected or malformed input. In our implementation, exceptions raised by the \gls{euicc} that map to well-defined error codes (i.e., subclasses of \texttt{EuiccException}) are not treated as test failures. These represent handled errors indicating that the input was invalid but the card responded appropriately.

By contrast, when an \texttt{UndefinedError} is returned, we treat this as a potential indicator of an unhandled internal error or inconsistent implementation behavior. These cases are flagged for further investigation. Additionally, exceptions occurring outside the \gls{euicc}, such as Python \texttt{AssertionError}s or test harness failures, are treated as bugs in the testing infrastructure and are logged separately.

\paragraph{Conclusion}
By combining property-based data generation with structural knowledge of \gls{asn1} types, we extend the fuzzing coverage of the \gls{euicc} interface beyond what is possible with \gls{apdu} mutation alone. This enables the discovery of semantic inconsistencies and unhandled corner cases in \gls{euicc} implementations, especially when compared across different vendors during differential testing as shown in \cref{subsec:differential_testing}.


\section{CLI}
\label{sec:cli}

% implementation can be used as a library for scripting with the lpa
% easier to use for most users is a cli -> make the functionality available as a cli i.e tracing, lpa, and fuzzing
% faster to record new esims for the apdu fuzzing etc
% to implement argument parsing we use the standard library argparse together with argcomplete for console autocompletion and rich for prettier output printing

% structure
% cli is structure in 3 main functions tracing, lpa, and fuzzing
% tracer (as shown in tracing section) handles the tracing of apdus between the simtrac2 and lpa
% lpa makes all the euicc, notification and profile handling functions available via a cli
% fuzz is split up into data fuzzing and apdu fuzzing aswell as compare -> enables comparison of two or more recordings and prints them out as json (json represents tree structure and shows differences)
% data fuzzing basically wraps pytest and executes all tests for each test class
%
% structure is defined by the folders
% each subfolder represents a new kind of functionality i.e fuzz or lpa -> see figure that shows tree structure
% the __init__ folders implement a run() function in which the args are parsed and distributed to the corresponding handling functions -> distribution also calls run() function with the args of the corresponding handling function -> see listing
% parser definiction happens on the handler level and global arguments are difined in the __init__ file in which also the subparsers are attached to the parent parser

% extensibility
% easy to extend thanks to its modular sturcuture
% completly independet from library -> library does not depend on cli


While the implemented library provides a programmatic interface to the \gls{lpa} and \gls{euicc} operations, many users—especially testers and engineers—require a more accessible method for interacting with the system. For this reason, we provide a fully-featured \gls{cli} that exposes all major functionalities of the system, including \gls{apdu} tracing, \gls{lpa} operations, and fuzzing workflows.

The \gls{cli} is built using Python’s standard \texttt{argparse} module for argument parsing, extended with \texttt{argcomplete} to enable shell auto-completion. For improved readability and formatting of terminal output, the \texttt{rich} library is used. This combination allows for an interactive, user-friendly \gls{cli} with both developer ergonomics and production readiness in mind.

\subsection*{Structure}
The CLI is organized around three core commands:
\begin{itemize}
    \item \textbf{tracing} — Interfaces with the \texttt{simtrace2} device and the \gls{lpa} to capture \gls{apdu} traffic in real time. This functionality is discussed in detail in \cref{sec:tracing}.
    \item \textbf{lpa} — Exposes \gls{euicc} communication features such as profile management, notification handling, and remote procedure execution via the \gls{cli}.
    \item \textbf{fuzzing} — Wraps both \gls{apdu}-level and data-level fuzzing. Additionally, it provides a \texttt{compare} command to compare multiple trace recordings and highlight structural differences in \gls{json} format.
\end{itemize}

Each of these commands is implemented in its respective subfolder (e.g., \texttt{tracing/}, \texttt{lpa/}, \texttt{fuzz/}). The modular structure of the \gls{cli} is shown in Figure~\ref{fig:cli_structure}, which illustrates how the subcommands map to the file and folder hierarchy of the project.

\begin{figure}[h]
    \centering
    \begin{forest}
  for tree={
    font=\ttfamily,
    grow'=0,
    child anchor=west,
    parent anchor=south,
    anchor=west,
    calign=first,
    inner sep=1pt,
    l=1.5em,
    s sep=3pt,
    edge path={
      \noexpand\path [draw, \forestoption{edge}]
      (!u.south west) +(3.5pt,0) |- node[fill,inner sep=1.25pt] {} (.child anchor)\forestoption{edge label};
    },
    before typesetting nodes={
      if n=1
        {insert before={[,phantom]}}
        {}
    },
    fit=band,
    before computing xy={l=15pt},
  }
  [cli
    [\_\_init\_\_.py]
    [fuzzer
      [\_\_init\_\_.py]
      [apdu\_fuzzer.py]
      [compare.py]
      [data\_fuzzer.py]
    ]
    [trace
      [\_\_init\_\_.py]
      [record.py]
      [replay.py]
    ]
    [lpa
      [\_\_init\_\_.py]
      [euicc.py]
      [notification.py]
      [profile.py]
    ]
  ]
    \end{forest}
    \caption{CLI folder structure and modular separation of functionality}
    \label{fig:cli_structure}
\end{figure}

\paragraph{Dispatch and Argument Parsing}
Each submodule implements a \texttt{run()} function that parses the subcommand’s specific arguments and dispatches execution to the appropriate internal handler. At the top level, the root \texttt{\_\_init\_\_.py} file defines the global parser, registers the subcommands via subparsers, and handles any global options. This design pattern is shown in \cref{lst:cli_parser}.


\begin{lstlisting}[caption={Top-level CLI dispatch pattern}, label={lst:cli_parser}]
def add_subparser(parent_parser: argparse._SubParsersAction) -> None:
    trace_parser: argparse.ArgumentParser = parent_parser.add_parser(
        "trace",
        help="Trace-level operations (record, replay)",
        formatter_class=RichHelpFormatter,
    )
    trace_subparsers = trace_parser.add_subparsers(dest="trace_command", required=True)
    record.add_subparser(trace_subparsers)
    replay.add_subparser(trace_subparsers)


def run(args: argparse.Namespace) -> None:
    if args.trace_command == "record":
        record.run(args)
    elif args.trace_command == "replay":
        replay.run(args)
\end{lstlisting}


Each subcommand module (e.g., \texttt{fuzz/data\_fuzz.py}) provides its own parser configuration and encapsulated logic, adhering to a clearly defined interface.

\paragraph{Integration with Pytest}
The data fuzzing component internally wraps \texttt{pytest}, leveraging the structure of Python test classes defined with the Hypothesis framework (cf. Section~\ref{subsec:data_fuzzing}). Each test class corresponds to a group of \gls{rsp} commands. By invoking the data fuzzing \gls{cli}, all available test classes are executed against the connected \gls{euicc}, with proper initialization and teardown logic handled automatically.

\subsection*{Extensibility}
The \gls{cli} is designed with extensibility as a primary concern. Adding new commands requires minimal effort: developers only need to create a new subfolder, define a \texttt{run()} function, and register the new command in the main \gls{cli} dispatcher. Moreover, the \gls{cli} is completely decoupled from the core library logic, ensuring that library users are not forced to depend on the \gls{cli} subsystem and vice versa.