In-App Network Monitoring
App runtime environments are complex. A large proportion of user complaints center on “slow page loading,” “login failures,” and “unstable connections” — network-related issues that are often unreproducible in test environments. Traditional server-side monitoring or CDN logs cannot fully reconstruct the real network state on the client side.
The core objectives of this in-app network monitoring solution are:
- Proactively sense the user’s actual current network conditions
- Precisely identify the real root cause of network failures
- Provide structured data support for diagnosing, fixing, and optimizing network issues
The implementation consists of two parts: application speed checks and DNS hijacking detection.
Application Speed Check
Principle
This module uses Android’s native TrafficStats API, measuring at the application UID granularity. Within a fixed short time window (5 seconds), it reads received traffic increments and calculates the client’s actual download speed by dividing increment bytes by time difference. This approach requires no extra permissions or network requests, runs lightweight, and reflects the user’s real network experience.
Check Triggers
- App launch: Execute immediately on first launch
- Network switch: e.g., switching from WiFi to cellular
- App foreground resume: Avoid missing network state changes during background periods
- Request failure: Assist in diagnosing network-layer causes
- Every 20 seconds: Satisfy continuous sampling needs in long-session scenarios
Since detection inherently requires a 5-second sampling window, it naturally has built-in deduplication — no additional cooldown period needed. Additionally, to prevent concurrent multi-thread checks from causing statistical anomalies, an atomic variable marks the current check state, ensuring only one check runs at a time.
Data Reporting
| Field | Example Value |
|---|---|
| Average download speed | 32.00 KB/s |
| Current network type | WIFI / 4G / 5G |
| Carrier name | China Unicom |
| Cellular signal strength | -88 |
| Roaming status | false |
| WiFi signal level | 3 |
Note: Some fields require specific permissions (signal strength/WiFi level require ACCESS_FINE_LOCATION; roaming status requires READ_PHONE_STATE). The solution only checks for permissions without actively requesting them — if permissions are missing, corresponding fields aren’t reported. Speed values are automatically formatted as B/s, KB/s, or MB/s based on magnitude.
DNS Hijacking Detection
Principle
DNS hijacking is a common client-side network security issue, more likely to occur on public WiFi or with certain smaller carriers. We detect hijacking by comparing system DNS results against trusted DoH (DNS over HTTPS) service results.
- System DNS: Results obtained via
InetAddress.getAllByName() - DoH services: Authoritative results obtained through encrypted connections from Google DNS, AliDNS, and Tencent DNSPod
Google DNS: https://dns.google/resolve?name=%s&type=A
AliDNS: https://dns.alidns.com/resolve?name=%s&type=A
DNSPod: https://doh.pub/dns-query?name=%s&type=A
Trusted IP Pool and Caching
The trusted IP pool consists of two parts:
- Current DoH results: IPs obtained from the three DoH services in this check
- Historical DoH cache: Previously successfully resolved IPs via DoH, stored locally with a 90-day validity period
The cache uses a structured storage design with a core “domain-IP-timestamp” association: IPs are grouped by domain, each IP entry carries a last-updated timestamp for expiration checks. Core strategies:
- Cache update: After each successful DoH resolution, valid IPs are added to the domain’s cache; existing IPs get their timestamps refreshed to maintain active IP validity
- Expiration cleanup: Periodically filters out IP entries exceeding the 90-day validity period, preventing cache bloat on the device while ensuring timeliness
Before executing the judgment logic, a network connectivity check confirms the device is actually connected, avoiding misdiagnosis of no-network situations as DNS issues. The specific logic:
- If all IPs returned by system DNS are in the trusted IP pool → resolution normal
- If any system DNS IP is not in the trusted IP pool → suspected hijacking
This design accounts for CDN and load balancing scenarios where the same domain’s resolution results vary by time and region. Relying solely on real-time DoH comparison would easily misidentify normal IP rotation as hijacking. Maintaining historical DoH cache effectively mitigates this.
Check Triggers
To conserve device resources, DNS hijacking checks are triggered only when network requests fail AND match specific exception types, with a 5-second cooldown between checks. The cooldown’s core purpose is preventing a single network issue (like persistent connection failure) from repeatedly triggering detection: it reduces redundant DoH requests (saving user data) and avoids frequent thread utilization (improving app smoothness).
Compared to high-frequency triggers like launch or network switches, this on-demand trigger + precise exception matching strategy better fits real business scenarios while reducing unnecessary resource consumption.
Exception Type Set
The following network exceptions are highly correlated with DNS hijacking:
| Exception Type | Cause Category | Description |
|---|---|---|
CertPathValidatorException | Untrusted certificate | TLS certificate chain not trusted, common in MITM attacks |
SSLHandshakeException / SSLPeerUnverifiedException | TLS handshake failure | Certificate chain anomaly or interrupted handshake, possibly hijacking-induced |
UnknownHostException | DNS resolution failure | Cannot resolve domain to IP, commonly seen in DNS hijacking/poisoning |
ConnectException | Connection refused | Network reachable but target port unresponsive, possibly a fake address post-hijack |
SocketTimeoutException | TCP timeout | Network congestion or service unreachable, investigate alongside DNS results |
Detection Result Categories
Based on system DNS and DoH resolution outcomes, results are classified as:
| Result Status | Determination Condition | Description |
|---|---|---|
| Hijacked | Both system DNS and DoH succeed, but system IPs not in trusted pool | Suspected DNS hijacking |
| Network Issue | Network disconnection detected | Real network connectivity interruption |
| System DNS Failure | System DNS fails but DoH succeeds | Local DNS configuration issue |
| DoH Failure | System DNS succeeds but DoH fails | DoH service unavailable |
| Normal | All system DNS IPs in trusted pool | Resolution results consistent |
| Unknown | Network connected but all DNS services fail | DNS server issues or port blocking |
This fine-grained classification helps quickly pinpoint root causes, avoiding the trap of attributing all anomalies to “hijacking.”
Data Reporting
| Field | Example Value |
|---|---|
| Target domain | api.example.com |
| System DNS returned IPs | [1.2.3.4, 5.6.7.8] |
| DoH returned IPs summary | GoogleDNS=[1.2.3.4]; AliDNS=[1.2.3.4, 9.9.9.9] |
| Detection result status | IS_HIJACKED / PASS / NETWORK_FAIL etc. |
| Detailed reason | Contains anomalous IPs, trusted IP pool, etc. |
| Triggering exception type | TLS_CERT_UNTRUSTED_SIGNATURE / DNS_RESOLUTION_FAILED etc. |
Note: To avoid noise in analysis, only hijacking-confirmed cases are reported.