mirror of
https://sharelatex.tu-darmstadt.de/git/681e0e7a3a9c7c9c6b8bb298
synced 2025-12-07 21:27:59 +00:00
166 lines
14 KiB
TeX
166 lines
14 KiB
TeX
% !TeX root = ../Thesis.tex
|
||
|
||
%************************************************
|
||
\chapter{Design}\label{ch:design}
|
||
%************************************************
|
||
\glsresetall % Resets all acronyms to not used
|
||
|
||
% section introduces core design of our differential testing framework
|
||
% supports systematic analysis of mutliple commercial and test eUICC cards
|
||
% needs: reproduce and replay real-world interaction between euicc and lpa, mutate protocol-level inputs to explore edge cases and test for proper error handling, compare card responses under similar inputs to detect differences
|
||
% we propose a three layered design:
|
||
% 1. tracing and replay
|
||
% 2. APDU fuzzing and comparison
|
||
% 3. Data fuzzing
|
||
|
||
% Design 1: tracing and replay
|
||
% objective: capture real interaction sequences between an LPA and a target euicc -> enable deterministic replay of these recorded APDU sequences on different cards
|
||
% Design
|
||
% real world traces show how euiccs are used in practice -> also includes potential undocumented behaviour
|
||
% replaying the same APDU sequences accross multiple cards -> direct differential comparison
|
||
% setup must remain as close as possible to the actual communication path to path
|
||
% key design elements
|
||
% passive tracing: tracing component that intercepts APDUs exchange over a pysical interface using simtrace2; APDUs are associated with functional operations i.e select isd-r through partial classification
|
||
% structured recording: each recorded session includes metadata such as command classification, src and target aids
|
||
% replay engine: injects recorded APDUs into a session between an LPA running on an smartphone and the euicc; adjusts session specific fields i.e aids; diverging behaviour in responses or error codes are flagged
|
||
% why?
|
||
% tracing and replay -> provide rrealistic baseline for comparison and reproducibility without assuming full protocol compliance or specification access
|
||
|
||
% Design 2: APDU fuzzing
|
||
% objective: explore input space of the euicc rsp protocol stack by mutating valid APDUs and observing how cards respond to unexpected, malformed or corner-case inputs
|
||
% desgin rational
|
||
% valid traces may not expose edge-case behaviour or robustness against invalid input
|
||
% fuzzing tests implementations ability to reject malformed inputs and recover from protocol violations
|
||
% structured scenario runner allows controlled variation over function sequences
|
||
% recording of input and output allows for direct differential comparison
|
||
% design
|
||
% scenario-based execution: define sequences of euicc operations into scenarios -> anchor fuzzing process; scenario is built from high-level commands
|
||
% mutation-engine: APDU data is mutated using deterministic or randomized strategies similar to other industry standard fuzzers which inlcude mutations including bit-flips, zeroing data blocks, random byte replacement, block shuffling, truncation
|
||
% mutation tree representation: execution state and mutations are recorded in hierachical tree -> each node represents function call, input mutation, and oberved result -> enables exhaustive and resumable fuzzing runs
|
||
% exception aware runner: failures are isolated and retried with proper reset handling (reset card to original state) -> one invalid input does not compromise subsequent fuzzing steps
|
||
% comparison engine: responses different euiccs can be compared node-by-node; deviations are reported and visualized as a tree; deviating paths from normal behavior during the scenario run are visulaized
|
||
% why
|
||
% apdu fuzzing allows us to probe error-handling paths, specification and asn1 parser boundaries -> uncover differences that would remain hidden during standard execution
|
||
% deterministic strategies ensures reproducibility and side-by-side comparision with runs on other euiccs
|
||
|
||
% Design 3: data fuzzing
|
||
% fuzz with structurally valid inputs using property-based testing
|
||
% design rational
|
||
% explore the validity boundaries of specific protocol fields
|
||
% valid input parsin and robustness of euicc interfaces under syntactically valid but semantically unusual inputs
|
||
% automate test generation for application-layer API endpoints
|
||
% design
|
||
% type-aware input generation: generate complient payloads that respect expected schemas
|
||
% property-based fuzzing: test LPA application-level interfaces by producing wide range of structurally valid inputs
|
||
% no oracle required: focuses on detecting crashes, exception, or malformed responses instead of behavioral divergence between cards
|
||
% replayable tests: tests can be performed on mutliple cards and results compared
|
||
% why
|
||
% design enables in-depth testing of fromat correctness and schema adherence for specific interfaces
|
||
% complements other two strategies by focusing on structured input space exploration
|
||
|
||
This section introduces the core design of our differential testing framework \sysname, which supports systematic analysis of multiple commercial and test \glspl{euicc}. The goal is to provide a flexible and extensible platform capable of:
|
||
|
||
\begin{itemize}
|
||
\item Reproducing and replaying real-world interactions between \glspl{euicc} and \glspl{lpa},
|
||
\item Mutating protocol-level inputs to explore edge cases and verify robustness,
|
||
\item Comparing card responses under similar inputs to identify differences.
|
||
\end{itemize}
|
||
\marginpar{\sysname is the differential testing framwork for eSIM analysis}
|
||
|
||
To achieve these goals, we propose a modular three-layered architecture:
|
||
|
||
\begin{enumerate}
|
||
\item \textbf{Tracing and Replay}
|
||
\item \textbf{APDU Fuzzing}
|
||
\item \textbf{Data Fuzzing}
|
||
\end{enumerate}
|
||
|
||
\section{Design 1: Tracing and Replay}
|
||
\label{subsec:design_1}
|
||
|
||
The first design focuses on capturing and replaying real interaction sequences between an \gls{lpa} and a target \gls{euicc}. This allows deterministic replay of recorded \gls{apdu} sequences on different cards for side-by-side comparison.
|
||
|
||
\paragraph{Design Rationale.} Real-world traces provide insights into how \glspl{euicc} are used in practice, including undocumented behavior not covered by specifications. Replaying identical \gls{apdu} sequences across cards enables direct differential testing. To ensure realistic conditions, the setup is designed to remain as close as possible to the original communication path between \gls{lpa} and \gls{euicc}.
|
||
\marginpar{Design 1 replays real LPA–eUICC interactions for side-by-side card comparison.}
|
||
|
||
\paragraph{Key Components.}
|
||
\begin{itemize}
|
||
\item \textbf{Passive Tracing:} A tracing module passively intercepts \gls{apdu} exchanges over a physical interface using \texttt{simtrace2}. Commands are partially classified and tagged with functional metadata, such as the selection of \gls{isdr}.
|
||
\item \textbf{Structured Recording:} Each session is recorded along with metadata, including command classifications, source and target \glspl{aid}, and session context.
|
||
\item \textbf{Replay Engine:} Captured traces are injected into a session between an \gls{lpa} and an \gls{euicc}. The engine adjusts session-specific fields (\eg, \glspl{aid}) and flags diverging behavior in the response status words or payloads.
|
||
\end{itemize}
|
||
|
||
\paragraph{Motivation.} This design provides a realistic baseline for comparison and reproducibility without requiring full specification access or assuming protocol compliance. It enables empirical analysis of protocol behavior under operational conditions.
|
||
|
||
\section{Design 2: APDU Fuzzing}
|
||
\label{subsec:design_2}
|
||
|
||
The second design focuses on exploring the input space of the \gls{euicc} \gls{rsp} protocol stack by mutating valid \glspl{apdu}. The aim is to test robustness against malformed, unexpected, or edge-case inputs and to expose implementation-level inconsistencies.
|
||
|
||
\paragraph{Design Rationale.} While real traces offer insight into typical usage, they often fail to reveal vulnerabilities related to invalid inputs. \gls{apdu} fuzzing is essential for testing the correctness of error handling and boundary enforcement.
|
||
\marginpar{Design 2 mutates valid APDUs to test eUICC robustness and correctness by comparing recorded behavior.}
|
||
|
||
\paragraph{Key Components.}
|
||
\begin{itemize}
|
||
\item \textbf{Scenario-Based Execution:} Scenarios are high-level sequences of \gls{euicc} operations (e.g., profile download) that anchor the fuzzing process.
|
||
\item \textbf{Mutation Engine:} Valid \glspl{apdu} are mutated using deterministic and randomized strategies, including bit-flipping, truncation, data zeroing, byte replacement, and block shuffling.
|
||
\item \textbf{Mutation Tree Representation:} The fuzzer constructs a hierarchical tree representing each function call, input mutation, and observed result, supporting exhaustive and resumable test runs.
|
||
\item \textbf{Exception-Aware Runner:} Each test is isolated, and card resets are used to restore a clean state, preventing a single failure from corrupting the session.
|
||
\item \textbf{Comparison:} Results from multiple \glspl{euicc} are compared node-by-node. Deviations in status words, exceptions or data are reported and visualized to highlight divergent execution paths.
|
||
\end{itemize}
|
||
|
||
\paragraph{Motivation.} \gls{apdu} fuzzing allows systematic probing of error-handling logic, \gls{asn1} decoding boundaries, and specification ambiguities. The use of deterministic strategies supports reproducibility and enables direct comparison across different cards.
|
||
|
||
\section{Design 3: Data Fuzzing}
|
||
\label{subsec:design_3}
|
||
|
||
The third design targets application-level logic using structurally valid inputs. It leverages property-based testing to exercise schema-conformant payloads and detect semantic inconsistencies or robustness issues.
|
||
|
||
\paragraph{Design Rationale.} Data fuzzing explores the validity boundaries of specific protocol fields. Unlike raw \gls{apdu} mutation, it focuses on high-level, syntactically valid but semantically unusual inputs to stress the logic of the \gls{lpa}-\gls{euicc} interaction.
|
||
\marginpar{Design 3 targets semantic correctness using schema-conformant, high-level input fuzzing.}
|
||
|
||
\paragraph{Key Components.}
|
||
\begin{itemize}
|
||
\item \textbf{Type-Aware Input Generation:} Payloads are generated according to type definitions and field constraints, ensuring compliance with expected formats.
|
||
\item \textbf{Property-Based Fuzzing:} A variety of structurally valid inputs are generated to test \gls{lpa} application-layer endpoints systematically.
|
||
\item \textbf{No Oracle Required:} Rather than expecting specific output, the tests flag crashes, exceptions, or malformed responses as anomalies.
|
||
\item \textbf{Replayable Tests:} Fuzzing inputs are recorded and replayable across multiple cards, enabling differential analysis and regression testing.
|
||
\end{itemize}
|
||
|
||
\paragraph{Motivation.} This design complements trace and \gls{apdu}-level fuzzing by shifting focus to semantic and structural correctness. It enables in-depth testing of parser robustness and adherence to data schemas within application-level interfaces.
|
||
|
||
\section{Design Comparison}
|
||
|
||
% all three design strategies present distinct approaches and tradeoffs as shown in \cref{tab:design-strategies}
|
||
% tracing and replay: focuses on precise behavioral comparison by replaying fully valid sessions -> ensures valid input but limits the exploration of unexpected inputs or edge cases
|
||
% APDU-level fuzzing: intruduces controlled mutations to valid APDUs -> expading coverage by probing implementaion-specific error handling and robustness; balances between generting diverse test-cases and preserving meaningful comparison across different euiccs
|
||
% structured data fuzzing: generates structurally valid inputs -> excels at identifying semantic inconsistencies and deeper behavioral differences
|
||
% together they provide a diverse and capable fuzzing framwork
|
||
|
||
\begin{table}[t]
|
||
\centering
|
||
\caption{Comparison of Design Strategies}
|
||
\label{tab:design-strategies}
|
||
\begin{tabular}{|l p{.25\textwidth} p{.25\textwidth} p{.25\textwidth}|}
|
||
\hline
|
||
\textbf{Design} & \textbf{Goal} & \textbf{Mutation Type} & \textbf{Input Validity} \\
|
||
\hline
|
||
Design 1 & Behavioral Comparison & None (Replay only) & Fully valid \\
|
||
\hline
|
||
Desing 2 & Protocol Robustness Testing & Byte-level mutations & Valid base, mutated \\
|
||
\hline
|
||
Design 3 & Semantic Boundary Exploration & Schema-level & Structurally valid \\
|
||
\hline
|
||
\end{tabular}
|
||
\end{table}
|
||
|
||
Each of the three design strategies presented in this chapter targets a different dimension of the fuzzing and differential testing problem, offering complementary strengths and tradeoffs, as summarized in \cref{tab:design-strategies}.
|
||
|
||
\textbf{Tracing and Replay} focuses on the deterministic reproduction of real-world \gls{lpa}-\gls{euicc} sessions. By replaying fully valid \gls{apdu} sequences captured from live devices, this strategy ensures strict behavioral equivalence and reproducibility. However, it is limited in its ability to explore malformed or edge-case inputs.
|
||
|
||
\textbf{APDU-Level Fuzzing} extends this foundation by introducing structured mutations into valid \glspl{apdu}. It strikes a balance between input validity and exploratory depth, allowing the framework to probe robustness, error-handling routines, and implementation-specific divergences while still supporting comparative analysis across multiple \glspl{euicc}.
|
||
\marginpar{Each design targets a distinct dimension of eUICC differential testing and by combination yields a broad, modular fuzzing and testing framework.}
|
||
|
||
\textbf{Structured Data Fuzzing}, finally, operates at the semantic layer by generating well-formed but edge-case-rich inputs for application-level interfaces. This approach excels at uncovering logic flaws and inconsistencies in the parsing and interpretation of complex data structures, particularly those encoded in \gls{asn1}.
|
||
|
||
Combined, these desings form a comprehensive and modular fuzzing framework capable of both functional and robustness testing of commercial \gls{esim} and eSIM-on-SIM implementations. Their integration enables a wide coverage of the input space, from valid production-level traffic to syntactically and semantically malformed payloads, thereby supporting rigorous security and conformance evaluations. |