master_thesis/Chapters/Design.tex

% !TeX root = ../Thesis.tex

%************************************************
\chapter{Design}\label{ch:design}
%************************************************
\glsresetall % Resets all acronyms to not used

% section introduces core design of our differential testing framework
% supports systematic analysis of mutliple commercial and test eUICC cards
% needs: reproduce and replay real-world interaction between euicc and lpa, mutate protocol-level inputs to explore edge cases and test for proper error handling, compare card responses under similar inputs to detect differences
% we propose a three layered design:
% 1. tracing and replay
% 2. APDU fuzzing and comparison
% 3. Data fuzzing

% Design 1: tracing and replay
% objective: capture real interaction sequences between an LPA and a target euicc -> enable deterministic replay of these recorded APDU sequences on different cards
% Design
% real world traces show how euiccs are used in practice -> also includes potential undocumented behaviour
% replaying the same APDU sequences accross multiple cards -> direct differential comparison
% setup must remain as close as possible to the actual communication path to path
% key design elements
% passive tracing: tracing component that intercepts APDUs exchange over a pysical interface using simtrace2; APDUs are associated with functional operations i.e select isd-r through partial classification
% structured recording: each recorded session includes metadata such as command classification, src and target aids
% replay engine: injects recorded APDUs into a session between an LPA running on an smartphone and the euicc; adjusts session specific fields i.e aids; diverging behaviour in responses or error codes are flagged
% why?
% tracing and replay -> provide rrealistic baseline for comparison and reproducibility without assuming full protocol compliance or specification access

% Design 2: APDU fuzzing
% objective: explore input space of the euicc rsp protocol stack by mutating valid APDUs and observing how cards respond to unexpected, malformed or corner-case inputs
% desgin rational
% valid traces may not expose edge-case behaviour or robustness against invalid input
% fuzzing tests implementations ability to reject malformed inputs and recover from protocol violations
% structured scenario runner allows controlled variation over function sequences
% recording of input and output allows for direct differential comparison
% design
% scenario-based execution: define sequences of euicc operations into scenarios -> anchor fuzzing process; scenario is built from high-level commands
% mutation-engine: APDU data is mutated using deterministic or randomized strategies similar to other industry standard fuzzers which inlcude mutations including bit-flips, zeroing data blocks, random byte replacement, block shuffling, truncation
% mutation tree representation: execution state and mutations are recorded in hierachical tree -> each node represents function call, input mutation, and oberved result -> enables exhaustive and resumable fuzzing runs
% exception aware runner: failures are isolated and retried with proper reset handling (reset card to original state) -> one invalid input does not compromise subsequent fuzzing steps
% comparison engine: responses different euiccs can be compared node-by-node; deviations are reported and visualized as a tree; deviating paths from normal behavior during the scenario run are visulaized
% why
% apdu fuzzing allows us to probe error-handling paths, specification and asn1 parser boundaries -> uncover differences that would remain hidden during standard execution
% deterministic strategies ensures reproducibility and side-by-side comparision with runs on other euiccs

% Design 3: data fuzzing
% fuzz with structurally valid inputs using property-based testing
% design rational
% explore the validity boundaries of specific protocol fields
% valid input parsin and robustness of euicc interfaces under syntactically valid but semantically unusual inputs
% automate test generation for application-layer API endpoints
% design
% type-aware input generation: generate complient payloads that respect expected schemas
% property-based fuzzing: test LPA application-level interfaces by producing wide range of structurally valid inputs
% no oracle required: focuses on detecting crashes, exception, or malformed responses instead of behavioral divergence between cards
% replayable tests: tests can be performed on mutliple cards and results compared
% why
% design enables in-depth testing of fromat correctness and schema adherence for specific interfaces
% complements other two strategies by focusing on structured input space exploration

This section introduces the core design of our differential testing framework \sysname, which supports systematic analysis of multiple commercial and test \glspl{euicc}. The goal is to provide a flexible and extensible platform capable of:

\begin{itemize}
    \item Reproducing and replaying real-world interactions between \glspl{euicc} and \glspl{lpa},
    \item Mutating protocol-level inputs to explore edge cases and verify robustness,
    \item Comparing card responses under similar inputs to identify differences.
\end{itemize}
\marginpar{\sysname is the differential testing framwork for eSIM analysis}

To achieve these goals, we propose a modular three-layered architecture:

\begin{enumerate}
    \item \textbf{Tracing and Replay}
    \item \textbf{APDU Fuzzing}
    \item \textbf{Data Fuzzing}
\end{enumerate}

\section{Design 1: Tracing and Replay}
\label{subsec:design_1}

The first design focuses on capturing and replaying real interaction sequences between an \gls{lpa} and a target \gls{euicc}. This allows deterministic replay of recorded \gls{apdu} sequences on different cards for side-by-side comparison.

\paragraph{Design Rationale.} Real-world traces provide insights into how \glspl{euicc} are used in practice, including undocumented behavior not covered by specifications. Replaying identical \gls{apdu} sequences across cards enables direct differential testing. To ensure realistic conditions, the setup is designed to remain as close as possible to the original communication path between \gls{lpa} and \gls{euicc}.
\marginpar{Design 1 replays real LPA–eUICC interactions for side-by-side card comparison.}

\paragraph{Key Components.}
\begin{itemize}
    \item \textbf{Passive Tracing:} A tracing module passively intercepts \gls{apdu} exchanges over a physical interface using \texttt{simtrace2}. Commands are partially classified and tagged with functional metadata, such as the selection of \gls{isdr}.
    \item \textbf{Structured Recording:} Each session is recorded along with metadata, including command classifications, source and target \glspl{aid}, and session context.
    \item \textbf{Replay Engine:} Captured traces are injected into a session between an \gls{lpa} and an \gls{euicc}. The engine adjusts session-specific fields (\eg, \glspl{aid}) and flags diverging behavior in the response status words or payloads.
\end{itemize}

\paragraph{Motivation.} This design provides a realistic baseline for comparison and reproducibility without requiring full specification access or assuming protocol compliance. It enables empirical analysis of protocol behavior under operational conditions.

\section{Design 2: APDU Fuzzing}
\label{subsec:design_2}

The second design focuses on exploring the input space of the \gls{euicc} \gls{rsp} protocol stack by mutating valid \glspl{apdu}. The aim is to test robustness against malformed, unexpected, or edge-case inputs and to expose implementation-level inconsistencies.

\paragraph{Design Rationale.} While real traces offer insight into typical usage, they often fail to reveal vulnerabilities related to invalid inputs. \gls{apdu} fuzzing is essential for testing the correctness of error handling and boundary enforcement.
\marginpar{Design 2 mutates valid APDUs to test eUICC robustness and correctness by comparing recorded behavior.}

\paragraph{Key Components.}
\begin{itemize}
    \item \textbf{Scenario-Based Execution:} Scenarios are high-level sequences of \gls{euicc} operations (e.g., profile download) that anchor the fuzzing process.
    \item \textbf{Mutation Engine:} Valid \glspl{apdu} are mutated using deterministic and randomized strategies, including bit-flipping, truncation, data zeroing, byte replacement, and block shuffling.
    \item \textbf{Mutation Tree Representation:} The fuzzer constructs a hierarchical tree representing each function call, input mutation, and observed result, supporting exhaustive and resumable test runs.
    \item \textbf{Exception-Aware Runner:} Each test is isolated, and card resets are used to restore a clean state, preventing a single failure from corrupting the session.
    \item \textbf{Comparison:} Results from multiple \glspl{euicc} are compared node-by-node. Deviations in status words, exceptions or data are reported and visualized to highlight divergent execution paths.
\end{itemize}

\paragraph{Motivation.} \gls{apdu} fuzzing allows systematic probing of error-handling logic, \gls{asn1} decoding boundaries, and specification ambiguities. The use of deterministic strategies supports reproducibility and enables direct comparison across different cards.

\section{Design 3: Data Fuzzing}
\label{subsec:design_3}

The third design targets application-level logic using structurally valid inputs. It leverages property-based testing to exercise schema-conformant payloads and detect semantic inconsistencies or robustness issues.

\paragraph{Design Rationale.} Data fuzzing explores the validity boundaries of specific protocol fields. Unlike raw \gls{apdu} mutation, it focuses on high-level, syntactically valid but semantically unusual inputs to stress the logic of the \gls{lpa}-\gls{euicc} interaction.
\marginpar{Design 3 targets semantic correctness using schema-conformant, high-level input fuzzing.}

\paragraph{Key Components.}
\begin{itemize}
    \item \textbf{Type-Aware Input Generation:} Payloads are generated according to type definitions and field constraints, ensuring compliance with expected formats.
    \item \textbf{Property-Based Fuzzing:} A variety of structurally valid inputs are generated to test \gls{lpa} application-layer endpoints systematically.
    \item \textbf{No Oracle Required:} Rather than expecting specific output, the tests flag crashes, exceptions, or malformed responses as anomalies.
    \item \textbf{Replayable Tests:} Fuzzing inputs are recorded and replayable across multiple cards, enabling differential analysis and regression testing.
\end{itemize}

\paragraph{Motivation.} This design complements trace and \gls{apdu}-level fuzzing by shifting focus to semantic and structural correctness. It enables in-depth testing of parser robustness and adherence to data schemas within application-level interfaces.

\section{Design Comparison}

% all three design strategies present distinct approaches and tradeoffs as shown in \cref{tab:design-strategies}
% tracing and replay: focuses on precise behavioral comparison by replaying fully valid sessions -> ensures valid input but limits the exploration of unexpected inputs or edge cases
% APDU-level fuzzing: intruduces controlled mutations to valid APDUs -> expading coverage by probing implementaion-specific error handling and robustness; balances between generting diverse test-cases and preserving meaningful comparison across different euiccs
% structured data fuzzing: generates structurally valid inputs -> excels at identifying semantic inconsistencies and deeper behavioral differences
% together they provide a diverse and capable fuzzing framwork

\begin{table}[t]
    \centering
    \caption{Comparison of Design Strategies}
    \label{tab:design-strategies}
    \begin{tabular}{|l p{.25\textwidth} p{.25\textwidth} p{.25\textwidth}|}
    \hline
    \textbf{Design} & \textbf{Goal} & \textbf{Mutation Type} & \textbf{Input Validity} \\
    \hline
    Design 1 & Behavioral Comparison & None (Replay only) & Fully valid \\
    \hline
    Desing 2 & Protocol Robustness Testing & Byte-level mutations & Valid base, mutated \\
    \hline
    Design 3 & Semantic Boundary Exploration & Schema-level & Structurally valid \\
    \hline
    \end{tabular}
\end{table}

Each of the three design strategies presented in this chapter targets a different dimension of the fuzzing and differential testing problem, offering complementary strengths and tradeoffs, as summarized in \cref{tab:design-strategies}.

\textbf{Tracing and Replay} focuses on the deterministic reproduction of real-world \gls{lpa}-\gls{euicc} sessions. By replaying fully valid \gls{apdu} sequences captured from live devices, this strategy ensures strict behavioral equivalence and reproducibility. However, it is limited in its ability to explore malformed or edge-case inputs.

\textbf{APDU-Level Fuzzing} extends this foundation by introducing structured mutations into valid \glspl{apdu}. It strikes a balance between input validity and exploratory depth, allowing the framework to probe robustness, error-handling routines, and implementation-specific divergences while still supporting comparative analysis across multiple \glspl{euicc}.
\marginpar{Each design targets a distinct dimension of eUICC differential testing and by combination yields a broad, modular fuzzing and testing framework.}

\textbf{Structured Data Fuzzing}, finally, operates at the semantic layer by generating well-formed but edge-case-rich inputs for application-level interfaces. This approach excels at uncovering logic flaws and inconsistencies in the parsing and interpretation of complex data structures, particularly those encoded in \gls{asn1}.

Combined, these desings form a comprehensive and modular fuzzing framework capable of both functional and robustness testing of commercial \gls{esim} and eSIM-on-SIM implementations. Their integration enables a wide coverage of the input space, from valid production-level traffic to syntactically and semantically malformed payloads, thereby supporting rigorous security and conformance evaluations.