Speaker Archive: 2021

This is an archive of weekly meeting talks from 2021. See the speaker archives for the talks from the current year, and links to archives from other years.

Date Speaker Links from talk Summary
2021-01-13 Cecylia

Papers:

TOR Bridge Distribution

Tor Bridges

  • What is a Tor Bridge?
    • Tor relays are public information and are therefore easy to block by censors
    • Bridges help solve this. Brdiges are relays that aren't put in the consensus of TOR relays
    • There are several types of Bridges:
      • Default Bridges: Baked into Tor browser
      • Private Bridges: Must be distributed manually
      • Distributed Bridges: A database of bridges is managed by TOR and users can request info on a bridge via email, HTTPS, or Tor's moat
  • We are assuming we have a way to distribute bridges to users
  • Censors can also request this info using above mentioned methods
  • The TOR project has a few mechanisms to slow censors from enumerating known bridges, but bridges are still getting blocked by censors quite quickly
  • Potential Solution: reputation based bridge distribution system. If a bridge gets blocked, the user is less trustworthy. The longer a bridge is unblocked, the more trustworthy the user is.
  • See the "links from talk" section for papers on existing schemes
Open Problems:
  • Sybil Defences:
    • Salmon requires a facebook acount
    • rBridge/Hyphae are invite-only
  • How do we determine when a bridge is blocked?
    • Self repoted?
    • ooni?
    • censored planet?
  • Privacy:
    • How do we prevent social graph data?
    • Possibly future work w/ hyphae's anonymous credentials
2021-01-13 David Chipwhisperer: https://www.newae.com/chipwhisperer

Discussed and demonstrated a side-channel attack on an Arm STM32 chip

Chipwhisperer sells kits that can be used for attacks similar to those demonstrated by David.

Kits come with everything you need are python is used to interface with them

2021-01-20 Asokan

Papers:

in-toto

Code Signing

Developer has key pair, and signs code using private key and user verifies with public key

Solarwinds Attack (2020)

  • System/Network management vendor
  • Network management system (Orion) <- needs to be able to monitor all devices in a network, manage updates, etc
  • Malware ended up getting signed

Software supply chains:

  • Software distribution infrastructure
  • example: CI, Build, source repo, packaging
  • In the case of solarwind, the attacker inserted malware into the build system or into the channel between the build system and build system

in-toto (Usenix SEC 2019) (See links for link to paper)

Threat model

  • interpose
  • compromise
  • incomplete product
  • counterfeit

Goals:

  • Security requirements
    • Integrity of supply-chain layout
    • Integrity of flows between steps
    • Step authentication
  • Functional requirements
    • Generality and transparency
    • graceful degradation of security

in-toto Design:

Define a graph that links steps together

They define a project owner which has a key pair

The functionary is the entity that runs a step and also has its own key pairs

Defines who is allowed to run each step

When a functionary produces a step, link data is generated and sent to the next step with a list of materials produced

Each step has a threshold to get a consensus (e.g. multiple repos)

Example in-toto deployment: Debian reproducible builds

40%-100% space overhead

Problems:

  • space overhead?
  • Projecting Project manager?
    • thresholding
    • offline keys
  • topology privacy?
    • zero knowledge proofs?
2021-01-20 Simon

Papers:

Search pattern leakage in searchable encryption

Access pattern disclosure on searchable encryption: Ramification, attack and mitigation

New Symmetric Searchable Encryption scheme:

Alice has a data set, and encrypts the db and uses a symmetric searchable encryption scheme

The Query generates a hidden query, the files are leaked, but query is unknown

Can potentially leak access pattern leakage, and common queries could get leaked

New scheme: (Dp access patterns for searchable symmetric encryption)

Add false positives to the database

If queries �look don�t look the same but the access pattern is the same it can leak the query�

Design goals:

Encrypted db

Random tokens

Fresh access pattern randomness

Performance metrics:

  • communication
  • storage at client
  • computation

Utility metrics:

  • true positive rate (close to 1)
  • false positive rate (close to 0)

Privacy:

  • Differential Privacy

Based on IPPE (Inner Product Predicate Encryption)

Advisory can only compute the inner product of two ciphertexts, but nothing else

Proposed scheme:

OSSE: Obfuscated SSE

Database and table, h: [n]-&gt [sizeof(h)] Encrypt polynomial and attach label

Queries will encrypt query and attach label

Advisory will return a document if one of the x�s is a 0 of an encrypted polynomial

Polynomial generation:

r_1 = (DOG + label + counter)

A special root is used as a false positive

Token Generation:

Find documents with �Dog�

for l from 1-> sizeof(h):
for c from 0 -> C_max:
x = (Dog + l + c) <- token/query + randomness for false negatives

False positives:

for id from 1-> n:
x=(id + 0 + -1) + randomness of if we send query or not

Non-matches:

for l from 1-> sizeof(h)
x = AAA + l + 0

DP analysis:

Epsilon = 13 (quite large) but good enough to defeat attacks

Communication overhead quite good

Computational complexity comp < n(Cmx + 1)

Client storage: very small burden

Evaluation: Frequency Attack:

Search pattern leakage in searchable encryption (See Paper links)

As number of queries increases, leakage increases

As false positive rate increases, leakage decreases

IKK - Access pattern disclosure on searchable encryption: Ramification, attack and mitigation (See Paper Links)

Accuracy lower than previous schemes

Conclusions:

Hiding search pattern is hard but effective against attacks

OSSE:

1 comm round, no client storage, hides search pattern, better asymptotic comm then Oram

Computationally expensive 30 min vs CLRZ 20ms

2021-01-27 Matthew R  

Project H.O.N.K.

Harmonized context detection framework

Background:

  • Context detection is using phone sensors to gather data about the environment
  • Trying to make a c.d.f. that will allow other applications to learn about the context the device is operating in

Explained using untitled goose game!

Basic idea:

Privacy: �How many people are around?�

Proximity: �How close is my device?�

Unfamiliarity: �How many strangers are around?�

How do we calculate these values? Using device sensor data:

  • GPS
  • Bluetooth
  • wlan
  • Accelerometer

Eventually:

  • Biometric Data
  • Application usage

What is it used for?

  • Authentication
  • Device Theft
  • Device loss

How have you implemented it?

There�s an android app, but testing during COVID is hard

Currently testing using MDC

Why MDC?

  • Years of user data
2021-01-27 Urs  

Having Fun with COVID Alert or �How I have spent some of the evenings during the pandemic�

COVID Alert:

  • Canada�s exposure notification app
  • Smartphone app that broadcasts Bluetooth signals to phones in proximity
  • When users test positive, nearby users will be notified of the exposure
  • Based on Google/Apple api, built by some volunteers at Shopify, security eval by BlackBerry, and maintained by Canadian Digital service
  • Open-source on GitHub

Fun Activity #1

  • Look at the code: Uses React Native, makes the code really hard to follow
  • Let�s focus on the upload part
  • When user tests positive, app uploads to Health Canada the secret codes it used for generating the broadcast signals
  • There is a traffic analysis threat
  • Observing this connection reveals user�s info

Traffic analysis threat:

  • ISP, cellphone op, NSA, or Amazon can learn of your infection
  • Notified CDS, Health Canada, OPC and IPC
  • Fix based on dummy uploads promised but nothing has happened
  • Germany, Switzerland, Italy, Belgium have this fix
  • HC, OPC, and IPC released privacy assessments accompanying launch of COVID Alert but no mentions this vulnerability

Fun Activity #2

  • Does anyone actually use COVID Alert for reporting infections?
  • App downloads the secret codes uploaded by infected users to learn of exposures
  • Let�s write a script that downloads these codes

  • The codes use hmac256 and need a secret key
  • COVID Alert GitHub repo doesn�t have the key
  • Every app has the key
  • Found the key, and it�s now on GitHub

Fun Activity #3

  • How many infected people upload their secret codes?
  • Every day the app creates new Temporary Exposure Key
  • The app�s broadcast Bluetooth signals are derived from today�s TEK and changed every 15-20 mins
  • Somebody downloading the TEKs will be able to link bluetooth signals derived from the same TEK
  • Upon an infection, the TEKs for yesterday, two days ago,� 13 days ago are uploaded.
  • Today�s TEK will be uploaded tomorrow
  • More general: for the next 14 days, the previous day�s TEKs will be uploaded.

Going from TEKs to number of infected Users:

  • Each TEK comes with a validity period, typically one day
  • At midnight, download the TEKs uploaded the previous day and try to find sequences of length 13
  • Sequences may not be complete
  • Incomplete sequences from entire day are garbled together and may result in fewer sequences
  • Increase download frequency: every hour download the TEks uploaded so far today
  • The difference between two subsequent downloads gives us the TEKs uploaded the previous hour
  • Will miss sequences where there were no TEK for yesterday
  • Will miss parts of a sequence after a gap
  • Sequences of length 1 are either newly reported infections or follow-ups for an already reported infection
  • #infections = sum of all sequences with length at least 3 (somewhat random choice)
  • Will miss sequences with a gap the day before yesterday or two days ago

Urs� numbers are 10% below the ones reported irregularly by Health Canada

10% of people who have entered an OTK have not uploaded their codes

Numbers now match

Significantly fewer follow-up uploads than expected, users giving no consent?

Spikes are not a good predictor for spike in overall infection numbers

Fun Activity # 4

  • The development of contact tracing apps started in April 2020 to assist manual contact tracing
  • Unfortunately, there was no much discussion on why this was the most useful app to have given the limitation of smartphones and the propagation behaviour of the virus
  • -Magic exposure condition of 2 meters for 15mins
  • -2 meters is too simple, other factors like environment and air flow
  • Bluetooth signal attenuation is not a good proxy for estimating distances

Main �discussion� was centralized vs decentralized architecture led mainly by security/privacy researchers

  • Privacy by Obscurity
  • Decision about type and design of app was made by Google/Apple
  • SO we ended up with not particularly useful but privacy protecting exposure notification apps
  • Arguable some privacy is �privacy by obscurity�
  • Android/iOS hide precise time and duration of exposure from alerted users

COVID Alert and Backward Tracing

  • Fall 2020: infection numbers started to rise again, importance of backward tracing emerged
  • Can we use COVID Alert system for help with backward tracing?
  • Easy when deploying our own clients on non-Google Play/iOS devices
  • But most people use iOS or Android with Google Play
  • iOS filters exposure notification Bluetooth beacons from any third party app so there is not much we can do in iOS
  • Google with kick apps that specifically look for these beacons out of their App Store

Making COVID Alert more useful

  • Let�s write a Bluetooth scanner app that also records locations of sightings
  • RaMBLE already does this!
  • All we need is an app that processes RaMBLE�s recorded sightings tried to find matches in the recording and visualizes them
  • Someone in Germany already did this!
  • Urs also contributed to an OpenStreetMap

RaMBLE and Warn-App-Companion

  • If you use COVID Alert and want it to be more useful in case of an exposure
  • Install RaMBLE and have it continuously record Bluetooth sightings
  • In case of an alert from COVID Alert, install the Warn-App-Companion app and use it to learn the precise time and location of the exposure
  • You can also periodically launch the Warn-App-Companion app to learn of exposures that did not satisfy the 15 mins and Bluetooth attenuation thresholds required by COVID Alert
2021-02-03 Masoumeh  

Privacy for {Correlated] Enterprise Communications

Project Setup:

Graph structure

Email communications

Publishing ngrams

  • Persevering information confidentiality - differential privacy

Assumption:

  • confidentiality captured in local communications (cliques)

General Goal:

  • Learn from enterprise communication without violating privacy
  • Differential Privacy in Social Networks:
  • Edge-level Privacy

Node-level Privacy

Correlation in Data Analysis

Conventional differential privacy: Participance of a record

Real picture: Inferential Privacy

Correlation in Ngram Analysis:

Conventional differential privacy:

Removing an edge (independent edges)

Our work:

Removing an edge and its correlated info

Group privacy

Correlation Models - Clique

Captures local communications, but infinite sensitivity!

Pufferfish Privacy (S,Q, O)

  • Secrets S, ngrams on the edges
  • Indistinguishable pairs Q existence/absence of ngrams in an edge
  • Data distributions set O captures correlation

The query F(X) The count of ngrams in the graph

Uses Wasserstein Mechanism

Gives an average correlation of edges

Open problem:

The more accurate the correlation model can be, the more private we can be

Applying sensitivity reduction techniques from node-level privacy

Ngram correlation

Other correlation models

Other neighbourhood models

Application to old NLP tasks (model building, topic modelling,�)

2021-02-03 Andre   Automatic speaker verification

Perform voice biometric auth:

Two categories:

Text-dependent: User auth using same enrolment phrase

Text-independent: Arbitrary auth phrase spoken by the user

Text-prompted: Arbitrary auth phrase chosen by the system

ML based:

MFCC, LPMS, CQTT

Use cases:

Access control

Forensics

Surveillance applications

Problem:

Spoofing attacks:

Impersonation

Text to speech

Voice conversion

Replay

Spoofing countermeasures:

Add an extra layer of security

Classifiers that distinguish between spoofed samples and bonafide

ASV + CM - The next big thing! Banks use it!

Ultimate solution or a false sense of security?

What does the literature tell us?

Several attacks on ASVs

Attacks on CMs

ASV+CM?

  • Not a single attack! Impermeable? (No, just harder)

Threat model:

Target system employs ASV+CM

Attacker places call from a device they own

  • logical access

Text-prompted: Replay attacks won�t work

Impersonation is not a realistic thread

Bottom line:

The attack needs to provide a sample that simultaneously evades the CM and the ASV system

The content is determined by the server, this has to be done in realtime

Systems vary drastically in terms of architecture and features

Adversarial examples:

Novel inputs crafted with some knowledge of the attacked system

The goal is to perturb an input sample such that:

  • Perturbation is imperceptible to a third party
  • The perturbation causes the target model to misclassify

Adversarial examples have been shown to be �transferable�

Challenges:

The inputs to the two systems are two different sets of features

Adversarial examples generators perturb the input features

Perturbing the two sets simultaneously is not possible

Adversarial perturbations are by nature �unstable�

  • alternating optimization

Paradigm shift:

To craft adversarial examples that can bypass the two systems simultaneously

We need to have the CM and ASV operate on the same input

Idea:

Incorporate the feature extraction stage into the attack pipeline:

When attacking a white box system, add another layer that extracts the features in realtime and then passes those to the next stage as part of the pipe

Ensure the model�s feedback can be propagated back to the first stage

For gradient-based models, this corresponds to back propagation of the gradients

Now that the two models accept the same input, introduce perturbations to jointly minimize two loss functions

Implementation and models:

Used code on top of Tensorflow and PyTortch

Used compute Canada (2 T4 cards, 16G)

Results:

Baseline: Low attack success, 2.25% at most

ASV, < 1% attack success

Out attack: 24% < success rate < 43%

What�s next?

Speech-To_text test:

To be considered a practical threat

Human evaluation

Future work:

Execute an end-to-end realtime attack

2021-02-10 Shannon

Papers:

Original Protocol:

Attack 1:

New Protocol:

Attacking the new protocol:

History of Authenticated Key Exchange

Unauthenticated Key exchange

  • Explicit auth using digital sigs
  • Implicit auth with keys as long-term crews

Done before with Diffie Hellman

Can we do the same with (R)LWE based key exchange?

LWE problem:

Given (A,b) A is some matrix, b is As+e mod q. Find s

RLWE

Given (A,b) A is a polynomial ring, b = As + e mod q. Find s

Attack 1:

Find S_B

p_a =??

Send

p_A=0

w_B = Sig(p_as_B+2g_B)

p_A=1 � p_A_2�

s_B doesn�t change

To find the sign, repeat process using P_A (1+x)k, k in {0, q-1}

We now have s_B or -s_B

Key exchange with reusable keys?

Claim: New signal function doesn�t leak any information about the key

Protocol: Same as Previsour except for computation of k_B and k_A

(Ding, Branco, Schmitt (2019/665) Leakage of Signal function with reused keys in RLWE key exchange

Attack 2:

What�s new with w_B?

It is constructed with As_B + known info + error

Fixed with p_A is fixed

Varies when Eve�s �Identity� changes

Send p_a = k, k in {0, q-1}

Collect signals when our known value is bounded by some h

Repeat with p_A = (1+x)k, k in {0, q-1}

The original attack took 3.8 hours, ours takes 1 min

Attack 2 took 50 minutes

2021-02-10 Gerogios  

Leader Based Consensus Protocols

Blockchains can be broken into two groups:

  • Longest chain based
  • Voting based <- Focusing on these protocols

Most Distributed Ledgers today are Blockchains

Start at round 0, known as genesis block B^0

A leader is elected at round 1, proposed B^1 extending B^0

B^1, B^2,� are eventually committed

What differentiates DLTs?

How participants/leaders are elected

  • Proof of work
  • Proof of Stake
  • Permissioned

How transactions are finalized

-Blockchains (longest chain, voting)

- DAG: Avalanche IOTA

Participant election process

PoW, PoS, Permissioned

Consensus finality rules

Nakamoto, classical, DAG

Classical Consensus: Deterministic finality through voting

Start at known genesis block B^0

A leader is elected at round 1, proposes new block extending old block

Validators run

Network model

Synchrony (Nakamoto)

  • Once an honest participant sees a message, every other honest participant will see it within some time

Asynchrony

  • messages take arbitrarily long to propagate but are eventually delivered

Partial Synchrony (Classical)

  • Network alternates between synchrony and asynchrony

Formalization: Network is asynchronous UNKNOWN until Global Stabilization (GST), after GST it is synchronous

State of the art protocols

  • Algorand
  • Tendermint
  • HotStuff

In general, use proof of stake, but elect leaders and voters differently

Model:

  • Fixed number of N=3f+1 validators, with 2f+1 honest and f malicious
  • Leaders are elected by rotating validators in a round-robin fashion
  • Votes are aggregated through threshold BLS, so each validator receives O(1) aggregate votes per voting step

The big take away is that the protocols looked at are all very simple. They could be explained to a strong undergrad in about an hour

2021-02-17 Oneeb

Papers:

Voice Assistants

  • Connected with smart devices (e.g. Amazon Alexa)
  • Ubiquitous (more than 3.25 billion voice assistant users in 2019)
  • Convenient to use
  • Appeals to people because of no physical interaction

The problem:

  • Concerns over security
  • Audio replay attacks
  • Skill squatting (Register same name, Word Obfuscations)
  • Access Control Vulnerabilities

Threat Model:

  • Access system functionalities
  • �Hey Alexa, order this for me�
  • Credential Theft

How are applications secured?

  • There is a manifest of components
  • There should be access control checks

Problems with existing permission re-delegation detection:

  • hard to find these privilege code areas, Static Analysis and Dynamic Analysis

Static Analysis -> a lot of manual effort but little output

Implementation is obfuscated

Dynamic Analysis:

  • Run-time analysis
  • Automation
  • Many existing tools, but limited to GUI based testing
  • Natural Language Processing might be required

Currently working on developing/extending a dynamic analysis tool that works for voice-assistant android applications

Currently exploring Snowdrop (2017)

2021-02-17 Vasisht   What are the different privacy risks for machine learning mechanisms that use graphs

Graph Embedding Algorithms

Low dimensional representation for graphs

Capture the original graph semantics

Released for further computation

(Recommendation systems)

Graph Neural Networks

State of the art performance

Use embeddings for downstream tasks

Link prediction node classification and clustering

Graph learning

Message Passing Algorithm

Weighted aggregation of features from neighbouring nodes

Graph NNs: Graph Data � Embedding � predictions

Similar to convolution over neighbouring nodes

Inductive training on different training and test graphs

Privacy Attacks on Graph Learning:

  • Membership Inference Attack
  • Graph Reconstruction Attack
  • Attribute inference Attack

Membership Inference Attacks

Overview:

Embeddings and GNNs overfits on train data

  • Distinct members and non-members

Adversary exploits this to infer membership status

Two attack scenarios: blackbox (predictions) and white box (embeddings)

Blackbox Membership Inference Attacks

Shadow Model attack

  • Attack preparation (train shadow model to mimic target model performance, train attack model to distinguish using prediction from shadow model)
  • Attack execution (Send output predictions from target inputs to attack model)
  • leak memership status with 15% on average inference advantage

Confidence Score Attack

  • Exploit: model more confident on seen training data than test data
  • Attack: If maximum confidence across all classes > threshold => member
  • Keak membership status with 28% on average inference advantage

Influence of Neighbouring Nodes

Increase range for aggregating features from neighbours by increasing layers

Attack and prediction accuracy decreases on increasing range

  • over-smoothening of features and reduces distinguishability

MIA influenced by structured data with preferential node connections

Whitebox Membership Inference Attacks

Attack Methodology

  • Attack Preparation (train encoder-decoder to get single value from embedding, cluster the membership values from the trained encoder)
  • Attack Execution (Embeddings for target input is clustered as member or not)

38% advantage

Graph Reconstruction Attack

Attack:

  • Prior: Aux subgraph sampled from target graph distribution
  • Attack Preparation (Train graph encoder-decoder on aux graph data_
  • Attack Execution (Trained decoder to generate graph from released target embedding)

Link Inference (Infer a potentially sensitive connection between two target nodes)

Were able to reconstruct the graph with high accuracy

Attribute inference Attack

Attack:

  • Prior (embeddings for some nodes and their attributes)
  • Attack Intuition (Preferential node connection with similar attributes, Neighbouring nodes mapped to close embeddings)

Attack Prep

  • ML Attack Model: Embeddings -> sensitive attributes

Attack Execution

  • Attack model outputs attributes for target embeddings

Conclusion

Releasing embeddings leaks sensitive information

  • Leak membership status of user�s node
  • Reconstruct target graph structure given the embeddings
2021-02-24 Florian  

oins Over Encrypted Data (Practice Talk)

Context: Cloud computing is here in a big way (revenue reaching $360 billion by 2022)

Security Challenges:

What were/are the greatest barriers for adoption?

55% Security concern!

54% Geographic (Also related to security from gov policies)

�Can I trust my data with a cloud provider?�

Adversaries in the cloud:

Malicious insiders

Criminals after high-pay off targets

Nation state actors

There is a solution to this!

Encrypt you data, upload to the cloud, perform computation on encrypted data, and then download it

(Homomorphic encryption, at least theoretically, is feasible for this)

As it stands, Homomorphic encryption is not capable of performing database operations efficiently enough

Database as a service:

In the cloud, we have a Database management system

Client issues queries over network

This can be part of a data centre infrastructure

Joins:

Implementing secure joins is extremely difficult

Not revealing the join pattern, you have to include everything (result table is n^2)

Revealing the join pattern

Reveals the frequency

Detrimental in cryptanalysis

Non-interactive:

First attempt: (HILM02)

Deterministically encrypt each value

  • No need to change join algorithm in DBMS
  • Very fast
  • Snapshot attacker can retrieve join pattern

Second attempt: (PRZB11)

Onion-encrypt each deterministic encryption

  • Cold database does not leak any info
  • joins on hot database very fast
  • hot database leaks join pattern

Joins over encrypt data with fine granular security (IEEE ICDE 2019)

A different way to onion wrap

  • Key-policy attribute-based encryption (KP-ABE)
  • Encrypt data with attributes
  • Generate key for specific attribute policy (selection values in the query)
  • Decryption only works for cipher texts whose attributes match the policy

Conclusion of fine-granular security:

KP ABE-encrypt deterministic encryption

  • Hot database only partially decrypted
  • Joins on hot database still fast
  • Series of queries leak more than their union

Can we remove the leakage?

Yes!

Qui-joins over encrypted data for series of queries (coming soon arx 2021)

-Use inner-product encryption, such that the decryption results matches

Inner-product encryption:

  • encrypt a vector of numeric plaintext
  • Generate key for that vector
  • Decryption results in one-way function of the inner product

Security claim:

The leakage of a series of queries is the union of their join patterns:

Proof idea:

Consider a pair of decrypted dot products

  • Case analysis over (non-)equal query id/join value/selected attributes (8 cases)

2 examples:

  • Different query id: different keys k
  • Selected attributes don�t match: random dot products

Assuming n dot products there are n(n-1)/2 potential collisions (birthday paradox)

Performance results:

Performance depends on the number of attributes and possible selection attribute values

Encryption: 3.4ms to 9.6 ms (per update)

Key get: 2ms (per query)

Decryption 21.2 to 53 ms (per selected database tuple)

  • Hash-based join algorithm

Operations are all essentially linear!

Conclusion of Series of Queries

IPE-encrypt join value and query-dep key

  • Leaks only union of the join patterns
  • Snap-shot attack can be unsuccessful
  • Join on hot database can be slow

What can we do with a single round of interaction?

Oblivious joins:

Perform the same instructions and memory accesses for a join of tables of the same size n independent of the input data

  • Circuit computational model

Introduced as sovereign joins in 2006 (AAKL06)

Best previous solution O(n log^2 n) runtime and O(log n) oblivious storage

  • Not implemented in practice

Sorting Network:

Circuit to obliviously sort an array

We will use it extensively as a building block

Complexity O(n log n) in theory and O(n log^2 n) in practice

First step:

Merge the two tables and sort them on the join attribute

Linearly scan the table and count the occurrences of each value in each table and augment with the count in the other table

We can now compute the size of the result table in a linear scan

Separate the two tables keep the count in the other table

Second Step:

Align the patterns using a formula based on the join count

Conclusion of Oblivious joins:

Single round of interaction can reduce the leakage to only the output size

Hardware encryption does have comparable performance

Final thoughts:

You cannot have all three of the following:

Non-interactive, single server

Efficient

No leakage

2021-03-03 Setareh  

Constant-time Verification for Long-Lifetime Byzantine Fault Tolerant State Machine Replication

Main Idea:

Permission-based BFT - SMR protocols are commonly used for distributed ledges

Shared Secret

  • Threshold signature -> constant time verification

An optimized version of MPSS

Sending out the proposals:

1 Create a new polynomial, and value at 0 should be 0

2 Create a set of polynomials of the same degree for the members in the current epoch

3. Generate a new set of encrypted values such that each individual can only decrypt their values

Proposals are sent to aggregate and individuals check if they agree

Resharing: New group perspective:

Reshare - Receipt messages

Compute new shares

P(x) + Q(x) + R_k(x)

Share recovery

  • Broadcasts a share-recovery

Security Evaluation:

Safety: If two statements x and x� are signed, with sequence number j, the x = x�

  • true for first epoch and any epoch after

Liveness: A properly-issued requests will eventually complete

  • Normal requests
  • Membership change requests

The implementation is built as a Rust library

2021-03-10 Faezeh  

Data sharing:

Datasets are often released for researchers to use

How do we prevent linkage attacks while still providing meaningful datasets?

Traditional Methods:

  • Strong assumptions on the knowledge and identification strategies of the attacker
  • Evaluating by simulating the attacker based on assumptions

Netflix prize dataset (combined with imbd dataset leaked users info)

Differential Privacy:

An output of an algorithm should not be that different if an entry is removed or added

Differentially Private Synthetic Data:

Team DPSyn

  • Second in all three competitions run by NIST

Pre-processing:

  • Clustering the variables so that similar variables are in the same cluster
  • Construct 1-,2-,3-way marginals between all variables in each cluster
  • Perturb the counts using Gaussian Mechanism

Post-processing:

  • Using some techniques to check the consistency among the marginals
  • Sampling form the noisy marginals and generate synthetic data

Team Gardn999

  • fifth and fourth in competition 2 and 3

DPFieldGroups Algorithm

Pre-processing:

  • Clustering the highly correlated variables
  • Perturb the counts using Laplace Mechanism

Post-processing

  • Decreasing the noisy counts to zero if they are below a threshold

Sampling from the noisy histograms with weighted probability and generate the synthetic data

Team PrivBayes:

Fifth place in competition 1 and third in 2 and 3

Algorithm:

  • Calcuate the level of correlation between each pair of attributes and save them as score for each pair
  • Perturb the scores using Gaussian Mechanism
  • Use noisy scores to create Bayesion network
  • Approximate the distribution of the original data
  • Perturb the P marginals via Gaussian Mechanism
  • Use the noisy marginals with the Bayesian network
  • generates the synthetic data by sampling tuples

Team RMcKenna

  • third in competition 1, fourth in 2, and first in 3
  • a blend on parametric and non-parametric approach

Algorithm:

  • Identify the highly correlated variables on a public data set
  • Perturb the 1-,2-,3-way marginals via Gaussian Mechanism
  • Use graphical models to determine a model for the data distribution

Team UCLANESL

  • fourth place in Math 1, fifth in 3

Uses Wasserstein generative adversarial network

Algorithm:

  • clip the discriminator gradient updates to ensure bounded l_2 sensitivity
  • add noise from the Gaussian Mechanism to the gradient updates of the discriminator
  • The moment Account technique will about the WGAN training if the privacy budget has been reached

Comparing parametric and non-parametric approaches:

Non-parametric:

  • similar functions in different algorithms
  • much less computational complexity
  • more pre-processing work
  • assume access to accurate public data

Parametric Approaches:

  • various functions in different algorithms
  • more computation complexity
  • less pre-processing work (in some algorithms)
2021-03-17 Akshaya  

ZXAD: Zero-knowledge Exit Abuse Detection for Tor


Tor Exit Abuse:

  • Traffic exits through publicly known exit nodes
  • Attackers use Tor to attack services, which causes services to block all Tor traffic

Circuit-level Attacks:

  • Client tries to send lots of streams through a single circuit
  • Exit can rate-limit the number of streams per circuit

Exit-level Attacks:

  • Client sends multiple streams via multiple circuits
  • Traffic from different circuits are unsinkable

Tor-level Attacks:

  • Client uses multiple circuits and multiple exits
  • Exits do not have a global view of the network

ZXAD high level:

  • Client is allowed to send a specific number of tokens to a server
  • Limits the number of unsinkable connections to a destination
  • Exit�s can mitigate Tor-level attacks by giving the server a hash of the token to differentiate different users


Unlinkability:
ZXAD relies on the existence of a unique ID for every client
Websites use IP addresses to block abusers outside of Tor

  • IP Addresses are neither permanent nor unique

ZXAD is general enough to be adapted to any unique ID

  • even ones with low entropy

(We are assuming client has a unique ID)

Participants and Threat Model:

  • Clients are malicious
  • Majority of DirAuths are honest
  • Exits and servers can collude with malicious DirAuths

Long term keys are generated and given to DirAuth for signing keys
Periodic signing keys are used for the client

Circuit Token Showing and Verification
Client sends the circuit token and the proof along with the first stream token in a circuit
Circuit Token creation is expensive, but can be done offline

Stream Token Showing and Verification
Client sends the stream token for every new destination within a circuit

Performance
~8.55ms ib client-side computation
31 sections on 8 cores of exit-side computation per 10-minute epoch for busiest node

2021-03-17 Zeinab   A Probabilistic Approach to Recommending Access Control Checks in the Android Framework

Main focus is on the Android Framework Layer

Framework Layer: App developers use this layer to access services (such as gps)

API Permission Requirements
Many APIs require the caller to hold a set of permissions and pass UID/user checks

Access Control Inconsistency
Say two APIs want to get access to a Sink
One API requires root, the other doesn�t. An attacker would just use the lower permission level

There is no regulation of framework-level customization

Tools to detect these inconsistencies
Kratos:
Not path sensitive, does not model user checks
Lots of false positives since it detects syntactically inconsistent permission enforcement


AceDroid:
Can conduct cross-image inconsistency analysis
Path sensitive
Considers a larger number of security checks
Normalizes security checks

ACMiner
If one path leads to a Security Exception, another will lead to a protected operation

  • Looks at conditional statements on paths leading to a SecurityException to identify potential authorization checks
  • Domain expert filters through list of potential authorization checks
  • Performs consistency analysis by uncovering relations between sets of auth checks

Not practical, need to define auth checks


The Problem with Inconsistency Detection:

  • May exist for a legitimate reason
  • e.g. An API that doesn�t do perm checks before using operation, whereas another already does


A New Approach:

  • Confidence Level: Resource X should require the following protection

Constraint Collector:

  • Collect hints from the codebase that suggest a 1:1 resource to protection mapping
  • Turn these hints into prior-probability constraints
  • Collect hints that denote relations between resources
  • Turn these hints into implication constraints

Probabilistic Inference Engine:

  • Transform constraints into factor graph and perform belief propagation
  • Output posterior probability

Considerations:

  • What happens if different constraints contradict one another?
  • What happens if a resource is accessed only in one API?
  • What should a recommendation be made?
2021-03-24 Peyman

Papers:

Time Capsule

Front-running in Blockchains

Front-running is when someone gets a benefit from early access to market information about trading
Typically because of a privileged position

In Blockchain, everyone has access to Privileged Info
Miners are in a more privileged position
Miners can be bribed by transaction fee/ gasPrice

Attack types:
Displacement - Not important to the adversary for original function call to run after her function
Insertion - Important to the adversary for original function call to run after her function
Suppression - Run function and delay original function call


Mitigations:
1) Transaction Sequencing

  • FIFO (Not possible on a distributed network)
  • Trusted third party
  • Pseudorandom Sorting
  • Transactions themselves

2) Confidentiality
Components of a transaction

  • Code of DApp
  • Current state of DApp
  • Name of function being invoked
  • Parameters supplied to function
  • Address of the contract the function is being invoked on
  • Identity of the sender

Commit/Reveal Strategy
1) Commit Hash or Encryption of transaction
2) Reveal transaction details after finalization of block

Cons:
Requires 3 transactions for each order


Other approaches:
Time Capsule [Doweck, Eyal 2020]
Identity-Based Encryption

2021-03-24 Jason  

Talked about an event he ran with the goal of getting high school students interested in computer security

  • Simulated a CTF to try to keep people engaged
  • Participants had next to no technical background
  • Challenges mostly focused on information gathering and using that information to break things
  • Overall event went well, and several participants attended future events
2021-03-31 Xinda  

Certified Robust Word Embeddings for Textual Classification via Differential Privacy

Textual Adversarial Attacks:

Example 1: Character substitution

Character insertion/deletion

Textual Adversarial Attacks

Example 2: Multiple character-level changes on a sequence

Word substitution

NLP Background (BERT)

Pre-training and Fine Tuning

Word Embeddings

Looking at Character BERT

  • More robust than BERT, but still susceptible to adversarial attacks
  • instead of looking at words, look at each character

Adversarial Robustness Definition

A small change in the input does not alter the scores so much as change the predicted label

Certified Robustness via DP in Image Classification

  • PixelDP: Let A be a randomized scoring function that takes an input x and outputs a response

We say A is epsilon-delta pixel dp

Certified Robustness via DP in Image Classification

  • Bound the l2-norm sensitivity of pre-noise layers
  • Add calibrated noises during prediction (and training)
  • Test the condition

Challenges in NLP Setting

  • P-norm cannot be used here. Need a new distance metric to measure perturbations
  • Bounding sensitivity of g(.) with regards to the new distance metric is non-trivial

Related Work

Certified Robustness of Word Substitution

  • Bound the sensitivity for a pre-defined adversarial set

Limitation:

  • Only for word substitution attack
  • Adversary set is a set of words selected in advance
  • The bound may be loose

Levenshtein distance

  • How to define meaningful perturbations?
  • Distance Metric

� Minimum Editing Distance

  • Levenshtein distance

� lev(�cat�, �bat�) = 1

� lev(�dogs�, �dg�) = 2

Our Solution 1 (Posterior Privacy)

  • Levenshttein Ball (Center C, radius r): represents all points in ball
  • Levenshtein Shell (C, r)
  • Determine the maximum perturbation radius r_0
  • Randomly draw r with a probability distribution
  • Uniformly sample a sequence from the Levenshtein shell

Solution 2 (Distance-aware Mapping)

  • An ideal mapping maps an input string to a high dimensional space
  • We leverage some basis points and the triangle inequality to achieve Lipschitz continuity
  • Exponentially complexity Bound of # of anchor points
  • Hamming distance
2021-03-31 Shelly  

Real-time Attacks Against Deep Reinforcement Learning Policies

Reinforcement Learning Process

  • Markov decision process: described by state, action, reward, and environment at each time step
  • Goal: Find optimal policy that maximizes the cumulative reward/return

Adversarial Examples

  • Fool classification model into misclassifying certain samples
  • Given an input x we want to find a modified version x* such that x and x* are similar but f(x) = f(x*)

Focusing on attacks that perturb states by adversary

Motivation - Real-time Attack

Current state-of-the-art attacksL

Perturbations are computed using black propagation or solving optimization problems iteratively at each time step (too slow!)

Adversary Model - Untargeted, White-box

Adversary attack goal:

  • Effective attack: lower victim agent�s return
  • Efficient attack: can be realized in real-time
  • Evades known defences

Adversary Capabilities:

  • Knows the agent�s policies
  • Adversary is capable of perturbing states input to the agent

Proposed Attack - UAP-O and UAP-S

Use precomputed universal adversarial perturbation mask for attack

  • The adversary generates adversarial perturbation mask offline
  • The adversary only apply the perturbation mask during test time. No other computation is done during test time

Two new attacks:

  • UAP-S: state-agnostic universal adversarial perturbation
  • UAP-O: observation-agnotistic universal adversarial perturbation

Attack Design:

To generate UAP-S and UAP-O

1) Adversary first collects a training set by monitoring the environment for one episode

2) Adversary sanitizes the training set by choosing only the critical states by using a relative action preference function

3) Generate UAP-S and UAP-O using victim policy weights

Evaluation - Efficiency

  • While FGSM is the most effective attack, it is too slow to be applied in real-time
  • The other attack obs-seq-fgsm-wb, while similar to our approach of universal adversarial perturbation, requires the agent to pause the env mid-episode to compute an adversarial mask for the entire game
  • Obs-seq-fgsm-wb(U) is out modified version of the attack where the adversarial mask is generated offline instead of mid-game

Evaluation -Effectiveness

  • FGSM is the most effective attack in all games, but cannot be mounted in real-time
  • UAP-S performs better or comparable to obs-seq-fgsm-wb and obs-seq-fgsm-wb

Future work:

Defense Methods:

  • Detect when an agent is under attack
  • Mitigation - Mitigate the damage

Weaker Adversary Model:

  • Explore possible attack where the adversary only has black-box access
2021-04-07 Ezz  

Variational AutoEncoders

Phones have a lot of sensors

Implicit and Continuous Auth

When not using phone it is still capturing a lot of data

Use this to keep people logged i

Want to be resilient to malicious actors from abusin

Use VAE to detect outliers and let someone else handle the response

2021-04-14 Emily  

Efficient Differential Privacy Mechanisms with High Utility for Prio

What is Prio?

  • Use by Mozilla to collect user stats
  • Aggregate statistics from users using MPC

Overview:

  • Clients secret share their data with servers, along with a proof that their data is valid
  • Servers publish an aggregation of the valid data

Users send data and secret-shared non-interactive proof (SNIP

Servers validate SNIP before adding data to their local aggregate

SNIP provides robustness against malicious clients

Prio Ensures privacy of clients assuming on honest server

Does not provide robustness

Problem:

Reconstruction attacks can be used to recover individuals� data from aggregate statistics

Adding dp would protect against such attacks

Naive Constructions:

Local DP:

  • Each client adds noise to their data before sending it to servers

Pro: Clients don�t need to trust a central party

Cons: Clients need to add the correct amount of noise

Distributed Noise Generation Between Servers

  • Dwork-MPC

Achieves (epsilon, delta)-dp

Noise generated and added to the result by servers

  • each server generates N/k bits
  • Secrete share these bits with all other servers

Threat Model:

  • Data curator: honest but curious
  • Clients: assume the proportion of misbehaving clients is below some acceptable bound
  • Servers: trusted to follow protocol

Ideal construction:

  • Achieves high utility like the central DP model
  • Efficient

DPrio

1) Eeach client generates Lapacian noise and secret shares it with the servers

2) Servers collaborate to randomly choose a particular client�s noise

3) Servers aggregate data from all clients and add recreate share of noise

4) Publish result to analyst

Server Noise Selection: Servers use a commit-reveal protocol to select a client�s noise

  • Server select a random number
  • Commit: Each server computes and publishes the hash of their random number
  • Reveal The servers open their hashes
  • Honest servers verify that each server correctly published the has
  • The servers sum all the revealed random numbers mod N

Privacy: theoretically achieves espilon-DP

  • Relies on the security of secret sharing mechanism

Robustness

  • a set of malicious clients cannot influence the final aggregate beyond their ability to choose arbitrary valid inputs
  • SNIPs ensures that all servers are honest to verify values
  • can only verify that input is a valid noise value, not that it was sampled correctly

Client-DPrio

  • All clients sample noise from a normal distribution, adjusted based on assumed number of malicious clients
  • Servers sum all noise from all clients which results in DP Gaussian noise

Privacy:

  • Achieves (epsilon, delta)-dp

Malicious clients may submit noise

Server-DPrio

Privacy: Achieves (epsilon, delta)-DP if at least one server is honest

Robustness:

  • Robust against adversarial clients when all servers are honest (same as Prio)
  • Don�t need to worry about noise added from malicious clients
2021-04-21 Justin  

Updates in the World of Signal

All the Numbers are US: Large-scale Abuse of Contract Discovery in Mobile Messengers:

  • Signal uses phone numbers as the identifiers
  • Privacy concerns: Have to upload all of your contacts to signal
  • Use hashes before uploading to servers!
  • Phone numbers have low entropy and thus don�t give you a lot.
  • You can just generate a database of every phone number for a country, it only takes 13 hours and 3.3TB of storage
  • Authors use a rainbow table and construction a new reduction function
  • Authors found that Signal was rate limiting of 120k contacts per day

Solution:

  • Have a big database of everyone, and a small database of new users
  • Rate limit the big database with a small limit
  • Rate limit the big database with a large limit

Signal integrating payments in app

  • Integrating �MobileCoin�
  • takes place inside intel sqx enclaves (Not known for its security)
  • Built on Stellar

Takeaways:

  • Stop trying to make hashing low-entropy identifiers a thing
  • Politicians secretly love signal
  • Someone should probably figure out private set intersection
  • MobileCoin exists; It�s probably fine, I don�t know
  • The Internet doesn�t understand software licenses
2021-05-12 Ian M  

Usual strategies are failing us when it comes to COVID

Thread landscape:

Fake ventilators

Salvaged equipment with non-OEM software

Open source medical devices

Supply chain security problems

Numerous problems with COVID test kits

EU MDR delayed introduction because of COVID

An intriguing security spec in EU MDR:

Software must demonstrate risk as low as possible

Post market surveillance to be required for software used in medical devices

Billing fraud

Debunked - hospitals aren�t recording COVID as a cause of death to get money

The payment amounts are correct but there is no evidence of upcoming

E-distribution of false and misleading information/advertising

667 investigations launched in canada so far

Money laundering

Significant fall off of casino cash

Payment claim fraud

In the US:

Paycheck Protection Program (120 defendants so far)

Economic Industry Disaster Loans

UI fraud

In Canada:

Canada Emergency Wage Subsidy - aimed at employers, No SIN check, no viability check

CERB - aimed at employees, supposed to get EI or CERB not both (expecting to take 4 years to sort out)

Using websites to sell nonexistent PPE or PPE at inflated prioces

The synthetic identity problem:

A collection of digital artifacts that purport to represent a real person, but there is no real person

Valuable for criminals to use for financial fraud

A twist on synthetic identity fraud: Ontario�s Support for Families program was abused

  • 400 bank accounts opened at the same institution in a month
  • Ontario benefit applications were submitted by holders of these accounts

Government under pressure to allow more than one payment to a family, so rule changed to allow >5 payments to same application

  • 10,000 payments made to fake participants

How did they do this?

  • Mallory was an insider
  • Mallory was senior IT leader responsible for the requirements specs for the benefit application program
  • Mallory was also the person who recommended the �fix� to allow >5 payments per family
  • Mallory was married to the provincial government IT manager responsible for disbursing funds
  • Smart enough to wipe the government issued phones, but not remove all digital footprints
  • Had outside �help�

In order to get speed and simplicity in getting help to people, caused it to be too easy to abuse

2021-05-12 Nils   SoK: How Robust is Deep Neural Network Watermarking

Provenance Verification

Training a DNN is expensive

A single entity provides a DNN that is consumed by many

Model Stealing is a threat

Ability to detect stolen models is a powerful deterrent

DNN Watermarking

Watermarking is a method for provenance verification of DNNs

Embeds a message into a model that is extractable using a secret key

Robustness

An attacker cannot derive surrogate models from access to the source model that do not retain the watermark

Watermark Retention: Success rate for matching bits of the embedded and extracted message

Problem

  • Robustness has only been studied in isolation against a relatively small set of attacks
  • No systematical, empirical of robustness claims
  • Uncertainty about a scheme�s robustness causes difficulties to trust their deployment

Watermarking Scheme Categories:

Model Dependent requires access to the model

Model independent No access to the model requires

During inference (active) Requires control over the source model�s deployment

Removal Attack Categories

Input Preprocessing

Model Modification

Model Extraction

Adversary Model

Goals Derive a surogate model

  • without retained watermark
  • That is well-trained
  • And reduce resources needed for the attack

Decision Threshold

  • Measure the watermarking accuracy as the success rate between the extracted and embedded message
  • - What is the lowest watermark accuracy to verify a stolen model?
  • Surprisingly, a method to derive this decision threshold is missing from all surveyed papers
  • It is difficult to determine, because we do not know the watermark accuracy of unmarked models
  • We propose a generic method to estimate a scheme�s decision threshold

Results:

A lot of attacks break the watermarking schemes.

Is there one attack that breaks all watermarks?

There is a dominant attack!

Computational Resources, restricting computational resources is not realistic. 100h GPU time costs about $43 USD

Dataset Availability, Plays a significant role for the attacker

Availability of Pre-Trained Models, Platforms for sharing, pre-trained models for free

2021-05-19 Stan  

Reputation (or, Why we did the things we did)

What is a reputation system? A rating system (seller reviews, reddit�.)

  • Collecting and combining opinions on some thing
  • The belief a person may have about what some group of people believe about another individual
  • A technology long used by human society to organize how people interaction with each other

Reputation is used to incentivize good behaviour

Problems:

  • People on the internet can be mean
  • We can try to make people less rude, even while letting be �anonymous� (previous work, anon-rep)
  • We can do that in ways that are harder to �cheat� and here are some concrete ideas of what �anonymous� actually means

Covid-19 and the natural experiment

  • novel virus with new info constantly being discovered
  • wide impact with lots of people having reason to discuss it
  • It is desired that discussion about it is generally accurate, at least as of the time a post was made

Methods of enforcing good information for Covid-19

  • reputation (as a threshold)
  • account age (as a threshold)
  • link to sources
  • Admin�s �secret sauce� to combat bots

Reputation was insufficient alone to encourage good behaviour

  • Reddit style reputation did not work well

Other reputation functions are possible though!

First principles of reputation

  • How much do people trust this individual?
  • How much do they do so �now�?
  • Are they at least not a troll?

First ideas:

  • Ask everyone every time I need to know
  • Have everyone put their opinions on some kind of bulletin board, and use their most recent posts

How we arrive at PRSONA and AnonRep

  • semi-trusted servers can make sure scores follow users despite pseudonym changes
  • They don�t have to be able to track users themselves to do this
  • AnonRep uses reddit-style reputation (which makes the problem simpler)
  • PRSONA uses a different form (where votes can always be updated, and each user only gets one)

Still interesting unanswered question:

  • How do users actually interpret reputation scores?
2021-05-19 Ted

Links:

  • https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/

Password Authenticated Key Exchanges (PAKEs) and the �Quantum Annoying� Property

PAKEs:

  • to do a key exchange, and rather than using PKI, use passwords

Typical method:

  • Use TLS to establish a secure connection, send password in the clear, server checks the hashed password

Problems:

  • No way for a client to know that server is handling the password properly: are they really only holding? Are they properly deleting the cleartext after login?
  • Is the extra round necessary? (Authenticate server, establish key, then authenticate client)

The solution: integrate the password check into the key exchange itself

  • less communication
  • don�t send cleartext password
  • have some assurance that server is handling passwords in a reasonable way

Why aren�t PAKEs everywhere?

  • Patentns (just expired)
  • Lack of standards (IETF draft available now)
  • Doesn�t fit into current internet infrastructure

Two styles of PAKEs: balanced and augmented

Balanced:

  • Both parties share the same password and mutually authenticate each other

Augmented:

  • Client has password, server has F(password) where F is a one-way function

We are focusing on balanced PAKEs

What is a secure PAKE?

  • Offline dictionary attacks should never be feasible
  • F(pwd) is a function that anyone can compute can never be sent to the other party
  • Forward secrecy: If pwd is later compromised, old messages can�t be decrypted
  • It is unavoidable that each online interaction Eve does gives her one guess at the password

CPace Security:

  • Use Diffie-Hellman as a base point
  • Eve sees a group element U and knows that U = hash(pwd)^u
  • But she doesn�t know u
  • Even knows that K = Hash(pwd)^(uv) = U^v = V^u

2019 CFRG asks for candidate PAKEs for use in IETF protocols, One balanced, one augmented

2020: CPace announced as the balanced winner, OPAQUE as the augmented winner

Considerations in deciding winners: aspects of security proofs, flexibility, efficiency of protocols,�

Can you compromise CPace with a quantum computer?

You don�t have enough info to solve the discrete log problem as the generator is never sent

Solving an entire discrete log problem for each guess of the password

Estimates on how long a maturate quantum computer would take to solve a discrete log range from seconds to minutes or hours

The above property was dubbed �Quantum Annoying� the only feasible offline dictionary attack is one that requires a discrete log solution for each guess of the password

Extend with a discrete log oracle Generic Group Model to prove Quantum Annoying-ness

2021-05-26 IanG  

Private Blocklist Lookups with Checklist

Blocklists are used by browsers to deter users from accessing malicious websites

Checks a site against two lists:

  • certificate revocation list
  • safe browsing list

Problem:

Send the entire block list to the client:

  • the block list is huge though (~93 Mb)

Browser pings server every time to look in the block list:

  • The server learns about every site you visit

A filter is constructed such that if it is a negative it is 100% negative, but if if allows for false positives

Only do a online lookup when the local filter misses, but this has been shown to leak significant information

Private Information Retrieval:
Alice wants to look something up on a database server
The database responds and has no idea which record was queried


2-server PIR
Two servers that have identical copies of the database and don�t collude

  • Alice sends a query to each database
  • Each database independently computes a responses and replies
  • Alice combines the responses to get the record she cares about


Offline-Online PIR
Server has to linear amount of work, but the server can do a lot of it up front rather than in real time

  • Prior to a query, Alice can download a �Hint� <- Takes a linear amount of time. Hint is much smaller than actual database
  • Alice can use hint to construct a PIR query that takes the database less time to answer

Does Alice have to get a new hint? What happens if the database gets updated after Alice gets the hint?

  • Currently after every query, and/or after every database update Alice needs to get a new hint

Puncturable sets:
paper: https://eprint.iacr.org/2021/345

New abilities:
GenWith(X) generates a key that when evaluated generates a set of sqrt(n) elements that contains X, but X is not distinguishable from the other elements.

Puncture(X) generates a set that had X removed from the set constructed by GenWith

Hint generation:
Runs Eval on the key, take all the records that correspond to the elements of the generated set and xOr them together.

Server does this lambda * sqrt(n) times <- every record should appear in one of these sets

What do queries look like?
Alice sends a (different) punctured key, and a (different) random record from the set to both servers

Server sends back the xOr of all the records associated with the punctured key, as well as the record that was randomly selected

Uses the hint from the server you aren't querying to construct a query, use genwith to construct the other

Puncture both

Alice updates her hint by using the response to the query generated with GenWith

2021-06-02 David  

Supersingular Isogeny Key Encapsulation

NIST PQC:

  • US National Institute of Stanrds and Technology
  • Standardization process for post-quantum public-key crypto schemes
  • Several rounds to determine which scheme to standardize

SIKE (Supersingular Isogeny Key Encapsulation)

  • IND-CCA2 KEM
  • Based on Supersingular Isogeny Diffie-Hellman
  • Uses Hofheinz et al. Transformation on SIDH to achieve CCA security

Changes for SIKE in third round:

  • Faster implementations of compressed variants
  • Hardware implementations for all parameter sets
  • Optimized ARM Cortex M4 implementations
  • Updated security analysis

SIKE has by far smallest public key size

  • There is a compressed form, but takes additional computation

Baseline security: O(fourth_root(p))

The case for SIKE:

  • Small key size
  • Straightforward relationship between parameter size and security
  • No reduction in concrete security since inception (10 years!)
  • Perfectly correct: no errors, reconciliation, or decryption failures
2021-06-02 Jiaya   Multi-stage Adaptive Authentication and Access Control

What is Adaptive Authentication?

  • An adaptive authentication system is able to automatically modify its behaviour and/or structure in response to changes in its operating environment.

Whether to authenticate:

  • Reduce the frequency of explicit authentication

How to authenticate:

  • Choose the best authentication mechanism

When to (re-)authenticate:

  • Continuous Authentication
  • Progressive Authentication

Adaptation Goals:

Security

  • Risk-awareness
  • Low false acceptance rate
  • Responsiveness

Usability

  • Efficiency
  • Low false rejection rate
  • Low power consumption

Adaptation Logic:

Condition

  • Contextual factors: location, time, etc

Adaptation

  • Structural: de-(activate) one or several authenticators
  • Parametric: modify parameters and/or change behaviours

Issue 1: All-or-nothing access control

Unlocked: full access

Locked: no access/explicit authentication required

Issue 2: Hierarchical adaption

  • Multi-modal continuous authentication

Multi-stage Adaptive Authentication

Organize adaptation policies in different stages

  • Higher uncertainty of identity, stricter access control
2021-06-09 Cecylia  

Network Performance of P2P Proxies

Proxy servers:

  • Software is run on servers somewhere (usually digital ocean, aws, greenhost�)
  • Acts are a redirection point, middle man for traffic

Snowflake:

  • Users running proxies on home networks or mobile phones
  • Share bandwidth with users that are being censored
  • No port forwarding!

The Internet is optimized for client-server model

  • For �user� networks (mobile networks, home networks, etc)
  • Download speeds much higher than upload speeds
  • VPS speeds:
  • Up/down: 1 GB/s

Throughput: limited by throughput of lowest throughput link

Sometimes we want more reliability than what reliable network protocols provide

If a relay drops, the user just connects to another

TCP meltdown problem:

  • If you build a reliable protocol on top of tcp, duplicate information can be sent, and more packets are being dropped and spirals out of control

Questions:

  • Is there congestion at the proxy or endpoints?
  • Can we tune or choose reliable protocols to perform better in these conditions?
  • How can we split traffic between multiple proxy peers?
  • Can we assign proxies to clients depending on geographic location?
2021-06-16 Bailey  

Caring about Sharing

Trying to go beyond Two-Party Advertising

  • Private data is shared with advertisers
  • Example: Tim Hortons was sharing data with a third party as well as its parent company

Tech companies partners with hospitals

Secure Computation:

  • Private Set Intersection
  • Private Machine Learning
  • Differential Privacy

Law and Policy

Privacy policies have extremely broad terms:

�Partners�, �third-parties�, �affiliates�

How does the overall acceptability vary across different types of multiparty collaborations?

How does acceptability vary in multiparty data collaboration for different user controls?

Survey was conducted

  • 916 participants
  • Each user receives 1 of 12 scenarios and a series of user controls and privacy mechanisms

Types of Multiparty Data Collaboration

  • Validation
  • Two-Way Two-Party Exchange
  • One-Way Two-Party Exchange
  • Acquisition
  • Merger then acquisition
  • Many-to-one Exchange

Results:

  • Concealed consent received the highest proportion of �Completely Unacceptable� answers
  • Opt-in consent had highest proportion of �Completely Acceptable�

Big take aways from free-form responses:

  • Some people said it was fine and needed for advertising
  • A lot of people hated it, said it often felt like entrapment
  • Some said it is what it is
  • Biggest thing is data collection should require consent

Take-aways:

  • Disambiguate Third Parties
  • Explicit over Implicit Consent
  • Communicating Privacy
2021-06-16 Yousra   Dissecting Residual APIs in Custom Android ROMs

Residual APIs

  • Residuals may lead to the following:
  • Increase code complexity
  • Broaden attack surface
  • Android�s ecosystem, the security implications of Residuals are not well studied
  • This project bridges the gap by performing a large-scale security investigation of Residuals in Android

Residual API is a private API that is not used on a particular device but is used in earlier versions

Model Residual APIs

  • Originally intended for use in specific models with certain functional requirements
  • Yet are still defined in other models without those functional requirements

Security Implications of Residuals:

  • As residuals are deemed unnecessary/ unfunctional, they may be naturally overlooked during integration and version updates

ReM - Analysis techniques that detect and evaluate the risks of custom Residuals

Detecting Residual Use Points:

  • Collect public entry points leading to the API

Detecting Evolution-induced Access Control Errors

Unsound Security Features

  • Locate Undefined custom permissions
  • Locate package name checks
  • Locate references to Deprecated Access Control Features

Obsolete Access Control Enforcement

  • Perform a highly-localized inconsistency analysis

Found that most flaws existed in residual apis vs non-residuals

2021-06-23 Thomas  

Is Differential Privacy the Right Defence against Membership Inference Attacks?

Membership Inference:

Given a model (Target Model), and a datapoint, was the datapoint used to train the model?

Membership Experiment:

Adv = 2*Pr(Adversary is correct) - 1

= TPR - FPR

Standard ML Models are Vulnerable to Membership Inference Attacks

Differential privacy allows certain bounds to be proven

The bounds are quite loose!

Contributions

  • Investigate prior membership experiments and provide a tighter bound
  • Construct a generalize membership experiment
  • Evaluate the performance of off-the-shelf MIA
  • Bias

Tighter bound:

Adv <= e^eps - 1 + 2 delta / e^epsilon + 1

2021-06-23 Rasoul  

S.E.A.L.

Simple Encrypted Arithmetic Library

  • Homomorphic Encryption Library
  • Developed by Microsoft

Homomorphic Encrypted data can have computation performed on it

S.E.A.L. Supports two encodings:

  • Polynomial Encoding
  • Batch Encoding (Allows for SIMD operations)

Noise Budget:

  • Each ciphertext has a noise budget
  • As operations are performed, more noise is introduced
  • Determined by N and q

Modulus switching:

  • A chain of modulus primes is used to reduce noise after operations
  • When should you use modulus switching? It�s not obvious
  • Compilers can optimize for this
2021-06-30 Hans  

PACStack an Authenticated Call Stack

Control-flow attacks:

  • Exploit memory error to inject shell code into writable memory (usually stack)
  • Corrupts return address to redirect execution flow

This can be prevented with modern defences

Return address corruption still prevalent

  • Code reuse attacks (Return orientated programming)

Arm Pointer Authentication

  • Hardware implementation that ensures pointers are not maliciously replaced by using MACs
  • Susceptible to reuse attacks

Reuse attacks:

  • Swap pointers around to change execution flow to technically valid, but unintended location

PACStack: High-level

Generate an authentication token using Pointer Authentication that depends on the previous token and the return address being protected

Forms a chain of projection making it almost impossible to reuse pointers

Collisions can still happen though and allow reuse of PAC chain

How do we prevent this?

  • Prevent the attacker from recognizing collisions by masking each auth stored in memory

Performance is relatively reasonable seeing a worst case performance penalty of ~15%

2021-07-07 John  

What are Reed Solomon Codes?

Error Correction codes (ECC):

Error correcting codes are code that have redundant information

Information can be used to reconstruct in the presence of noise

Reed Solomon ECC

  • Used in a lot of places (CDs, DVDs, QR codes)
  • Operate on blocks of data which are treated as finite field elements
2021-07-14 Asokan  

Intel Software Guard Extensions (SGX)

Allows for isolating environments (Enclaves)

Implemented in hardware

Enclave code/data encrypted by hardware

Supports:

  • Sealing: Hardware support key to turn an unsecured storage into secured storage
  • Attestation

Local Attestation: One enclave verifies another on the same device

Remote Attestation

Enclave Identity (Sealing Identity)

  • Sealing authority (SA) signs enclaves prior to distribution

Several break in the past

  • GSM A5/1 algorithm
  • Relay attacks against NFC payments

Is SGX useless?

  • Who is the adversary?
  • Where is your vulnerable program running?
  • There is a deployment push
  • TrustZone even more widely deployed/used
2021-07-14 Hossam   Rowhammer So Far

What is Rowhammer?

DRAM is a matrix of cell that each hold 1 or 0

  • Each cell has a capacitor

Cells accessed at row-granularity, and gets moved to row buffer

Cells are packed tightly to provide more storage

�Hammering� of one row flips bits in adjacent rows (Memory integrity violation)

Hammering a row is accessing the row repeatedly

Rowhammer-based Exploits

Breaking out of NaCl sandbox

  • NaCl assumes validated code does not change
  • induce bit flips in indirect jump instructions to jump to hidden shell code

Kernel privilege escalation

  • Induce bit flips page tables gives process access to its own page table

Rowhammer.js

  • No longer need cache-flush instructions

Remote browser-based attack using javascript

Throwhammer

Only need to send packets to victim

Relies on Remote DMA feature of NICs

No longer need to lure victim to malicious webpage

RAMBleed

Breaks confidentiality

Main enabler: likelihood of bit flips depends on bit pattern

Carefully arrange aggressor and victim pages in memory

Software Defences:

  • Increasing memory refresh rate which negatively affects performance
  • Reactive refresh but is hard to identify which rows are physically adjacent because of proprietary mapping
  • Physical isolation using guard rows, but wastes memory

Hardware Defences:

  • ECC, but has limited error-correction
  • Target Row Refresh, but can be circumvented using many sided RowHammer
  • Proactive trolling, prevent row activation rate from reaching level needed for bit flips

Good news, although still a problem today, but no Rowhammer attacks found in the wild yet

2021-07-21 Akshaya   Privacy-preserving Web Analytics

Most websites collect user data for analytics

Threat Model:

  • Only aggregate or encrypted user data is logged
  • Reported results are differentially private
  • Website operators are trusted not to log user data in clear
  • Resistant to compulsion and single snapshot attacks

Uses �PS signatures, which are Randomizable�

Frequency of Client Visits

  • Clients store encrypted connection counts locally
  • Server stores a differentially private histogram of frequency of client visits
  • Server learns client connection count when a client connects to it

Client-side persistent storage

  • Server inits a histogram with independent Laplacian noise added to every bin and generates a PS signature key pair
  • Client connects for the first time or reset its counter
  • Server issues a PS signature
  • Server updates its histogram
  • Client chooses randomness and randomizes the PS signature
  • Client encrypts 1 and the randomized signature using some public key encryption to obtain a token
  • Client stores token in its local storage

2021-07-21 Andre   Connecting the NOPS: A Quantitative security metric for CFI mechanisms

CFI mechanisms restrict control flow to limit jumping to unintended locations in code

How do we compare mechanisms? Currently there aren�t accurate metrics.

Proposal: �NOPS Finder�

Identifies path between gadgets

Take chi into considering while finding paths between gadgets

Derive metric from ability to find such path

2021-09-21 Miti  

CS458 Spring 2021

A teaching summary & lessons learned

  • Module design, lectures and videos
    • Main changes to the content
      • Updated slides from modules 4, 5.2 and 6 as well as created new videos for those topics
      • Updated the quizzes related to the modified modules
      • Improved Blogtask (added fine-grain rubric) and interactive sessions
      • Tools used for teaching were Learn, Teams, Crowdmark and CSsharepoint
    • Module Design: Trinity of elements to convey concepts
      • a) Intended learning outcomes
      • b) Teaching & Learning Strategies
        • Interactive activities
        • Lectures
      • c) Formative/Summative Assessments
    • The goal was to create an alignment among these elements
      • Example: Concepts discussed and learned in interactive sessions (b) were later re-introduced in assignments (c)
    • Content Sources:
      • Skipping it here for fairness reasons
    • Summary of content changes:
      • Module 4
        • Organized "Threats" subsection based on interception, interruption & modification threats.
        • Included IP spoofing, ingress/egress filtering details, smurfing attacks etc.
        • Firewalls: related content to previous sections of the module (vO).
        • Explored different firewall architectures to motivate DMZes (F&F). Proxying and MiTM, Tor SOCKS proxy.
        • Examples of NIDS/HIDSes & their architecture.
      • Module 5
        • Basic Internet S&P (module 5): Extending content layer-wise.
        • Link layer - WEP.
        • Network layer - VPN types, IPSec modes.
        • Transport layer - TLS: ClientHello, ServerHello, ClientFinished.
        • Application layer - SSH, SMTP & secure email.
        • Privacy: Tor message format in terms of encapsulation. Added PIR (as a data minimization PET).
      • Modules 6 & 7
        • Access control: SQL commands for granting/revoking privileges - DAC, RBAC. MLS databases - polyinstantiation etc. (E&N).
        • Updated confidentiality, authentication, availability, auditability sections (IBM, Oracle Docs).
        • Clarified what does and does not work for the data inference problem.
        • Used existing (not-private) data release examples to motivate k-anonymity & ell-diversity.
        • Included examples for k-anonymity & ell-diversity.
        • DP: motivated the definition, described sensitivity via examples, introduced the Laplace function, claimed Laplace function provided DP (proof in interactive sessions).
        • Module 7: Included Bailey's slides on ethics; focused on cost analysis.
    • Videos
      • Limited the length to ~20min
      • A lot of the work was done on editing, captioning and providing great availability (Microsoft Streams & Learn)

  • Quizzes and Blogpost
    • Quizzes
      • Skipping some parts for fairness reasons
      • Started the quiz at the start of the module, rather than 1 week into it.
      • Changed to multiple select from multiple choice
      • Modules split in 2 parts with 1 quiz per part
      • Infinite retries without showing actual answers
      • Future terms could use the Retake incorrect questions only option
    • Blogpost: Represented 5% of the final grade
      • Students had to
        • Make a blog
        • Comment on other's blogs
        • Reply to comments on their blogs
      • Fine-grained rubric was added: assimilated criteria in bullet-point from the webpage
      • Future terms: consider adopting a coarse-grained cubic, address blogpost/comment deadline issue
    • Final Assigment
      • Skipping it here for fairness reasons

  • Interactive Sessions
    • 5 Interactive Sessions on the following topic:
      • Wireshar
      • SAD DNS Cache poisoning attack
      • ADBc, ZKPs
      • Data Inference Problem
      • Differential Privacy
    • (Programming) TA's experience:
      • Improve "submit" command
      • A1: all-or-nothing scheme: hard to give partial marks
      • A2: Firewall IDS: Lots of time & effort spent by programming and sysadmin TAs.

  • Looking Ahead
    • Active Learning
      • Keep Q&A
      • Add/intersperse individual or peer activities
    • Teaching pedagogies
      • Flipped classrooms
        • Pre-class content
        • In-class activities
        • Post-class
    • For people looking to teach in the future the following certificates were recommended
      • FUT first, can be done as a Master's student
      • CUT afterwards
2021-09-28 Youcef  

Toward Post-Quantum Key-Updatable Public-Key Encryption via Supersingular Isogeny

  • One of the applications is for long term conversation
    • Protect the content using UPKE
    • Relies on logarithm, not quantum secure
  • Studied viability of UPKE post-quantum
  • UPKE is composed of 6 algorithms (KeyGen, Encrypt, Decrypt, GenUpdate, UpdatePrivate, UpdatePublic)
  • There are 5 properties that are desired in such an encryption scheme:
    • (1) Correctness
    • (2) Forward Secrecy
    • (3) Post-Compromise Security
    • (4) Asynchronicity
    • (5) Key Indistinguishability (not always required)
  • We have both Symmetric and Asymmetric UPKE, however Asymmetric UPKE needs a homomorphic structure
  • A wanted characteristic is IND-CPA ( Indistinguishability Under Chose Plaintext Attack)
    • We can add Updatability (adversary can update or choose a key to corrupt)
  • SIDH-UPKE
    • KLPT* can help with forward security
      • Issues:
        • No Asynchronicity -> no known solution
        • The degree of is too big
          • Solution: Separate into multiple isogenies
    • SIDH-UPKE respects only (1). Once KLPT* is introduced, it also respects (2) and (3)
  • CSIDH-UPKE
    • Issue: no forward-secrecy
      • Updated key centred around original secret key
      • Key too large after multiple updates
        • Solution: Use known cyclic group of order N. Downside is, the group structure must be known
    • CSIDH-UPKE is quantum secure
    • (5) can be obtained if the key size is limited.
2021-09-28 Lindsey  

Censorship Circumvention

Lox: Protecting the Social Graph in Bridge Distribution

  • Censorship Circumvention Research:
    • Investigating censor behaviour, tools and techniques
    • Monitoring censorship
    • Designing better circumvention tools
    • Finding flaws in circumvention tools
  • Notable usefulness example: Dramatic increase in Relay and Bridge connections during the Burma coup when the Internet was shutdown.
  • Bridge: server that exists outside the censored region
  • Techniques used by the censor to find users and terminate bridges
    • Enumerating bridges
    • Blocking bridges
    • Posing as users
  • Goal: Protect users, limit censor capacity to block users and bridges
  • Prior Work:
    • Tor bridge distribution
    • Anonymous Distribution by Social Graph
    • Leveraging Trust Levels
  • A bridge is needed when access is blocked to the Tor Network
    • Setup bridge via email with bridge authority
      • Use Tor browser for example to connect to the bridge
    • This limits censors from enumerating bridges (not 100% effective)
    • Average of 12 users per new bridge
  • Core concepts:
    • Bridge distribution using anonymous credentials to protect users and Social Graph
    • Invitations: honest users have friends
    • Bridge Authority is issuer and verifier
    • Leveraging Trust Levels:
      • Accumulate trust with time
      • Trusted users can recommend friends
      • More trust -> can go to more exclusive and trusted bridges
  • Lox:
    • Limits enumeration
    • Prioritizes protecting user’s Social Graph
    • Uses Trust Levels
    • Higher Trust Levels allows for higher number of bridges and protocols
      • Lvl 0: Untrusted, 1 bridge bucket
      • Lvl 1: Trusted, 3 bridge bucket
      • Lvl 2: Trusted, Invitations
      • Lvl 3: Trusted, Migration eligible
      • Lvl 4: Trusted, Migration + Invitation eligible
    • Computes MAC for issuing and verification
2021-10-05 Sajin  

Express: Lowering the cost of metadata-hiding communication with Cryptographic Privacy

  • Metadata-resistant communications
    • Why? To protect whistleblowers
    • Why are existing mechanisms not sufficient
      • Who, when and how much is not protected
      • Only content is protected
  • Private Information Retrieval (PIR)
    • Retrieve document from a pool from remote servers without leaking which document was retrieved
  • Express: Uses PIR in reverse to improve metadata hiding communication
    • Privacy guarantee for writes but not for reads
    • Threat model:
      • 2 non-colluding servers A, B that collectively hold the contents of a set mailboxes
      • Clients are potentially malicious
    • Private writing via PIR:
      • N mailboxes
      • Content of a mailbox is a vector D in F^n
      • Each server holds an additive secret share for this vector for each mailbox
    • To write a message:
      • Client prepares m, e_i in F^n
      • Client splits m into w_a and w_b
      • Sends w_a to A and w_b to B
      • Servers each add their w to the respective secret share of the mailbox the message is sent to
      • Issue: Scales poorly
  • Improve efficiency with Distributed Point Functions (DPFs)
    • Leverage PRFs under the hood
    • Scales logarithmically with N instead of linearly
  • Messages can only be read by the mailbox owner to whom it is sent
    • Enabled via simple symmetric encryption
  • Client requests both servers for the contents of the mailbox to read the messages
    • Dependent on honest clients
    • Malicious clients can submit malicious writes
      • Solution: auditing protocol:
        • SNIP: client proves to the server that the message is well-formed
  • Prevent targeted disruption:
    • Switch mailbox addresses from physical to virtual address (128-bit)
      • Client needs to know both physical and virtual address to send a message
  • Performance:
    • For 1M mailboxes. 101x less communication cost than Riposte on the client side
    • Savings come from:
      • Better auditing
      • Faster DPFs
    • Client cost constant irrespective of the number of mailboxes. Riposte scales linearly with N.
  • Cover traffic and plausible deniability
    • Send dummy messages to cover traffic using low client-side costs
  • Conclusion:
    • Improves state-of-the-art both asymptotically and in concrete performance numbers
2021-10-05 Simon  

Mixes

  • There are 2 types of anonymous communication systems:
    • Low latency
      • For example, TOR
    • High latency
  • In the presence of a Global Passive Adversary (GPA), TOR cannot protect its users
  • Solution: Mixes
    • Adding delays, cover traffic and change appearance of messages
      • Delays prevent arrival/departure timing
      • Cover traffic is there to fool observer as to limit guessing of sender/receiver of a message
  • Types of Mixes
    • Mixes operating in rounds:
      • Threshold Mix: batched with a threshold trigger
      • Timed Mix: batched with a time trigger
      • Binomial Pool Mix: (can be timed or threshold triggered) probabilistic (alpha chance message is sent, 1-alpha chance it is kept in the pool).
    • Others
      • Stop-and-Go (SG)
      • Continuous Mixes
  • Mixnet Topology:
    • Cascade
    • Free route
    • Stratified
  • Cover Traffic Types
    • End-to-end
    • Mix-to-Mix
    • Sender/Receiver based
  • Loopix
    • Stratified mixnet composed of:
      • Service providers
      • Continuous mixes (Poisson distribution on delays)
      • UDP, no circuits
      • Real messages
      • Drop dummies
      • Loop dummies: loops back to service provider
      • Mix loop dummies: helps against active attacks
    • Low latency (seconds)
  • Nym
    • Composed of:
      • Users and Provider
      • Mixnet (like Loopix) and Gateways
      • Validators
      • Nym blockchain
    • It is for profit: Users pay validators
      • Users pay validators for a bandwidth certificate
      • User sends the certificate to the gateway (which talks to the provider) with the message to prove he paid
    • Only gateway and validators know user IP
    • Money limits corruption/civil attacks
      • GPA/corrupt with budget
      • Depends on honest users total budget vs GPA budget ratio
    • Communication epochs: restructure nodes by shuffling them periodically
    • Details
      • Testnet has around 9k nodes
      • Launch late 2021/early 2022
      • Chat app and Crypto wallet included
2021-10-12 Vasisht  

Fault Tolerance of Neural Network in Benign and Adversarial Settings

  • Reliable Deep Learning
    • Deployed for critical applications for long periods of time
  • Fault Tolerance
    • Goal: NN performs well in presence of node/synapse fault + graceful degradation
    • Def: Epsilon-Tolerance
      • Epsilon=0: complete tolerance (ideal)
      • Epsilon>0: partial fault tolerance (realistic)
    • Stuck @0 Fault
      • Parameter values/intermediate computation
        • Can be caused for example by a fault in MAC unit
    • Generalization Error: Metric for fault tolerance
      • Low generalization error means higher fault tolerance
      • High generalization error means low fault tolerance
  • Enhancing Fault Tolerance: 3 approaches
    • Traditional Reliability Engineering
      • Adding redundancy (bad for resource constrained systems)
      • Augmenting synapses and nodes (bad for overparametrized NNs)
    • Fault Tolerance as a constraint
      • Computationally expensive and not scalable to large NNs.
    • Regularization
      • Restricts overfitting
      • Trade-off between test accuracy and generalization error
  • Approach Used:
    • 2 models
      • Feature extractor (FE)
      • Full connected network (FC)
    • 2 phase process
      • Phase 1: Train feature extractor
        • Game 1: Identify and extract robust features
        • Game 2: Distributional smoothering of extracted features
      • Phase 2: Supervised fine tuning
    • Phase 1:
      • Game 1:
        • FE: identify and extract features
        • Generator: Reconstruct input image from FE’s extracted features
      • Game 2: Minimax optimization between FE and a discriminator
        • Discriminator distinguishes FE’s output from Gaussian distribution
    • Evaluation
      • Better trade-off between accuracy and generalization error. Better than L1 and L2 regularization.
  • Adding noise during training
    • Improve fault tolerance
    • Can be added with objective to penalize large parameters (adding noise to the gradient)/
  • Impact of adversarial noise on fault tolerance
    • Model overfits
    • Higher generalization error compared to SGD/regularization
    • Increasing noise results in higher generalization error and lower fault tolerance
  • Impact of gradient noise on FT
    • DPSGD enhances fault tolerance (+privacy) with a bound on generalization error.
2021-10-12 Jason  

The problem with DNS

  • Designed a long time ago with a design philosophy that does not align with today’s world anymore
  • Can be abused by
    • Fragmentation
    • Most DNS messages are sent using UDP
    • Attacker registers long domain name
      • Requests a DNS resolver
      • Response exceeds what UDP packet can fit -> 2 fragments are sent back
      • Malicious middle box tries to send fake 2nd fragment to DNS resolver
      • Caching means all clients are affected
  • Solution: DNSSEC
    • Use digital signatures to sign sets of resource records
    • 2 keys
      • Key signing Key
      • Zone signing Key
  • How to maintain key integrity?
    • Construct a chain of trust
      • Root key on host machine
      • When zone signing key is queried, use trust anchor to verify the signature
    • Return of fragmentation for DNSKEY and RRSIG (large records)
      • Increases likelihood of fragmentation by a lot
      • Firewall can also block fragments -> denial of service
      • Fix: Use truncation flag and follow up with a TCP connection
    • Current approach of DNS community, keep keys and signature sizes relatively small
  • DNS protocol will need to be modified in some way
    • Early idea that is being run:
      • Move fragmentation to the application layer -> help with the firewall problem
      • Use “pseudo resource record” called RRFRAG when response too big for a UDP packet
      • Still keep truncated flag + TCP for backwards compatibility
      • DNS resolver decides whether to keep using UDP or switch to TCP when receiving truncated flag.
2021-10-19 Setareh  

The Last Mile: High-Assurance and High-Speed Cryptographic Implementations

  • Goal: Framework for efficient and trustworthy crypto libraries
    • Correct
    • Protected against side-channels attacks
    • Formal security proofs
  • Jasmin: Language for speed and compiler formally verified
    • Allows for “assembly in the head” programming
  • EasyCrypt: Proof assistant for provable security
    • Game-based cryptographic proofs
  • Enhanced Jasmin
    • What was added/improved:
      • Added SIMD instructions
      • Support for different value sizes
      • Instruction distributors
      • Embedding of Jasmin in EasyCrypt
    • Enhanced Jasmin compiler creates EasyCrypt models for
      • Functional correctness
      • Security proofs
      • Constant time
    • Added safety check to Jasmin Compiler
  • Safety analysis: By Jasmin Compiler
    • No division by 0
    • Out-of-bond array accesses
    • Variable initialization
    • All loads and stores take place in allocated chunk of memory
  • Benchmarks
    • ChaCha20
    • Poly1303
2021-10-19 Emily  

Similarity Detection in a sparse dataset setting

  • Problem: Privately detecting similarities in a sparse data setting
    • Ensure differential privacy
  • Idea:
    • Client encrypts their data and sends it to the server
    • Server does nearest neighbour using several K-D trees
      • Use interactive DP protocols
    • Client receives results
  • K-D trees:
    • Find nearest points near given data point
    • Binary tree where each node has K values (dimensions)
    • Alternate each dimension at each depth when traversing down or up the tree
    • Element of randomness in tree, 2 trees can look different even with the same starting dataset
    • Indices “split” the plane in regions
    • Need to backup through the height of the tree at most twice when searching -> O(logn) runtime
    • K-D trees useful as long as there are more items than dimensions
  • Overall protocol
    • Client encrypts data and sends it to the server
    • Server constructs servers K-D trees of the dataset
    • Server finds nearest neighbour in each K-D tree
    • Client keeps record of each answer
  • Summary
    • Solving the problem of privately detecting similarities in a sparse data setting
    • Using K-D trees for efficient search
    • Interactive protocol with the client
    • Assuming a semi-honest server
2021-10-26 Zeinab   Dissecting Residual APIs in Custom Android ROMs

  • Original Equipment Manufacturer (OEM) Private APIs
    • Framework layer
  • APIs are added, modified and removed frequently
  • OEM private APIs are added and removed erratically
    • Leads to Residual APIs
  • Consequences of Residual APIs
    • Increase in code complexity
    • Compatibility issues
    • Broaden attack surface
  • Historical Residual APIs
    • Previously used API
    • Defined but not used in at least one image
  • Lifespan of an API
    • Active Lifespan:
      • Actually used
    • Residual Lifespan:
      • Defined but not used
  • Model (device) Residual APIs
    • Originally for use in specific models (example: phone model)
  • Residual Safeguards
    • Configuration checks
    • Access Control Checks
  • Residual APIs may be overlooked during updates, leads to:
    • Unsound Security Features
      • Can allow attacks like keylogger attacks
    • Obsolete Access Control Enforcement
      • Failure to keep with improving standards
  • Large Scale Study performed:
    • 628 custom ROMS
    • 26,833 distinct OEM APIs
  • REM (REMNANT) structure:
    • ROM analyzer
      • Finds likely residuals
    • Usage-pattern analyzer
    • Risk Identifier
      • Generates Residual’s ICFG
      • Collect defined security features and see if they are used
      • Performs a differential analysis
      • Intra-ROM convergence analysis
      • Cross-ROM Inconsistency analysis
  • LG and Huawei have the highest number of Residual APIs.
2021-11-02 Florian  

Is Differential Privacy what you want to protect privacy in ML?

  • Need to protect sensitivity of training information for ML models.
  • Differential privacy:
    • Given random mechanism M, two adjacent inputs A, B: Pr[M(A) in S]/Pr[M(B) in S] ≤ e^ε
    • It is a provable guarantee
    • Maintained in post-processing
    • Differential private mechanisms are composable
      • Arbitrary background knowledge/executions/number of changes in the data set.
  • Differential private SGD (DP-SGD)
    • For each iteration, perform gradient clipping and perturbation
    • Clipping limits the impact of any individual input
    • DP-SGD assumes the worst case: the one input that causes the reverse direction in each iteration
    • The definition of differential privacy assumes that the adversary knows the entire data set except the one input in question
  • Empirical Privacy:
    • Practical success rate
  • Bound on membership inference
  • Membership is a property of the entire data set, not only the single member
  • There exists a trade-off between privacy and accuracy
  • How do you explain differential privacy?
    • Differential identifiability: was a sample part of the training set?
    • Use Bayes’ Theorem
    • Theoretical privacy: can provide a bound of posterior belief
    • Observation
      • The DI attack is tight on the bound when using data set sensitivity
      • There is a gap when using global sensitivity
  • Summary:
    • Protection against membership inference means very strong privacy parameters in DP-SGD necessary (might render the training useless)
    • To protect against membership inference, ML best practices are better suited than DP-SGD
    • Open Questions: What besides DP can we use to measure privacy in ML?
      • Explainability is an important aspect
2021-11-02 Xinda  

Model Poisoning Attacks and Defences on Federated Learning

  • Federated Learning
    • Multiple owners, single model
    • Each client trains locally
    • Local models are sent to the server who then aggregates them to form a global model that is the redistributed.
  • Robustness Issues:
    • Clients are not trusted
    • Attackers can backdoor/lower the accuracy of the global model
    • Even a single attacker can cause a lot of damage.
  • Existing Defenses:
    • Detect and remove outliers from model updates before aggregating
  • Poisoning attacks
    • Data poisoning attack
    • Model poisoning attack
  • General Attack Objective: Find malicious update within the space of the benign updates which maximize the distance between the benign and malicious aggregates
  • Tailored Attack: Find perturbation vector and a scalar that maximizes the distance between benign and malicious aggregates
  • AGR-agnostic attack: keep malicious updates with the scope of the benign updates.
  • Experiment:
    • MNIST non-iid setting
    • Non-iid common in real world
    • 100 clients in total, 30 participants
    • 20% of clients are malicious
    • Attacks have a very significant impact
  • Their own defence: Clustering-based Defence
    • Clustering limits the impact of the attacker
    • Evaluated using the same setting
    • Very little accuracy is lost relatively speaking
2021-11-09 Faezeh  

What is Machine Unlearning

  • Why would we need machines to unlearn?
    • Privacy
    • Security
    • Usability
    • Right to be forgotten: mandates that companies take reasonable steps to achieve erasure of a person’s data
  • Easy to remove data point from dataset but hard to unlearn what was learned from it
    • stochasticity in training
    • Sequential training process
  • Unlearning: if you learn a data point then unlearning it should bring you back to where you started before you learned said data point
  • Solutions?
    • Differential privacy: Bound on contribution from each data point
    • Re-training? Naive, has a large computational cost
    • Statistical query learning
      • Does not easily scale to more complex models
        • Not a general solution
    • Sharded, Isolated, Sliced and Aggregated training (SISA)
      • Fast unlearning
      • Weak-learner
      • Requires additional storage
      • Models are shared where each shard is trained on different data points: can just retrain the specific shard
        • Can also slice the shards
  • Verified Unlearning: User does not trust the service provider. How can the user verify that his data has been unlearned?
    • Probabilistic Verification of Machine Unlearning:
      • Membership inference attacks
      • Backdoor attacks
2021-11-09 Shelly  

Adversarial Policies: Attacking Deep Reinforcement Learning

  • Reinforcement Learning (RL):
    • Agent interacts with the environment
    • Sequential decision-making
    • Finding optimal policy to maximize reward
  • Deep RL:
    • Combining RL and Deep Neural Networks (DNN)
    • Good with high-dimensionality
  • Adversarial Examples:
    • Perturbations added to clean input
    • In RL, add noise to input state to make the model make a bad decision
  • Multiple Agent Environment
    • DRL agents playing against each other
    • Adversarial Policy:
      • Train adversarial agent to exploit weaknesses in the victim agent
  • Adversarial Model
    • Black-box
    • Unlimited queries
    • Victim follows a fixed stochastic policy
    • Can be modelled as a two player Markov game
      • Since victim is fixed, the problem for the adversary can be reduced into a normal Markov Decision Process (MDP)
    • 4 games:
      • Kick and Defend
      • You shall not pass
      • Sumo humans
      • Sumo ants
    • Adversary policy reliably wins against most victim policies and outperforms ZOO baseline in most environments and victims.
  • Qualitative Analysis:
    • The adversary agent did not learn a more “advanced” playing strategy
    • Masked Policies:
      • The victim’s observations of the adversary is static
      • Improves victim’s win-rate
    • For sumo ants, adversarial policies don’t work as well
      • Possibly due to low dimensionality
  • Defense: Fine-tuning. Two configurations
    • Single: only tuned against adversarial policies
    • Dual: tuned against adversarial policies and other ZOO policies
  • Summary:
    • New Threat Model
    • Verify that these adversarial policies exist for some environments
    • Verify that these adversarial policies win by confusing victims with weird actions instead of stronger policies
2021-11-16 Peyman  

Preventing Frontrunning attacks using Identity-based encryption

  • Maximal extractable value (MEV): revenue you can get by reordering, censoring or adding transactions in blocks.
    • For Etherium: currently lower bound is 740M USD
    • High Gas fees
    • Occupation of Blocks
    • Endanger the consensus
  • Frontrunning: attack to get MEV
    • A sends her transaction (tx) to the block proposer
    • The proposer adds it to their mempools
    • B send his tx to the proposer
    • Both A and B are in the mempool
    • The proposer can add their own txs before A’s txs even though A’s txs arrived before
    • Most of the profit goes to the frontrunners, not the miners
  • Example: Crypto-Coin NFT
    • Bought NFT from typo using front-running to buy before the order gets cancelled
  • Some of the solutions that exist: with their problems:
    • Slippage in DEXes leads to Sandwich attacks
    • Flashbots: Good but not really a solution
    • Trusted Execution Environment (TEE): Significant latency, storage limit, security issues
    • Submarine sends: not cryptographically secure, just for ICOs
    • Commit-Reveal: Deciding not to reveal, connectivity and availability
    • Treshold Decryption: High bandwidth overhead
  • RQ: Can we have front-running protection that works with less bandwidth overhead?
  • Identity-based-encryption (IBE)
    • Setup:
      • A PKG with msk and system-wide public key (pk)
        • ID such as email address or IP can be used to generate pk
    • Encrypt: encrypt m using pk, c= E_pk, ID(m)
    • Extract: S_ID= Extract(ID, msk)
    • Decrypt: m = Dec_SID(c)
  • Setup of the structure:
    • Players: Clients, Executors, Decryptors
    • Communication: smart contracts (Participate, DXG, IBE, Process, Target)
    • Adversary: n-t Executors, also not all Decryptors
  • 6 Phases:
    • 0: Participation
      • Evaluators and Decryptors call Participate + deposit
      • Client deposits in Participate contract
    • 1: Distributed Key Generation
    • 2: Encryption and Commitment
      • Client creates a commit contract to encrypt their transaction
    • 3: Broadcasting Block Key shares
    • 4: Decryption
    • 5: Execution
      • Processes the contract:
        • Validates received transaction(s)
        • Validates the commitments in the commit contract
        • Send incentivization feedback to participate contract
        • Sends amount of transaction to the target contract

2021-11-16 Justin  

An Introduction to Software Reverse Engineering

  • Recap on Compiling
    • Get machine code, feed it in the disassembler to get disassembly code (close to assembly code).
    • Use decompiler to get back the original code
    • Decompiler/disassembler can be lossy
    • Once you have decompiled C, try reversing to get an approximation of the source code
  • IDA Pro most commonly used tool:
    • Not open source, expensive
  • Ghidra:
    • Open source
    • Made by the NSA
  • Walkthrough of Ghidra
    • Difficulty is different from coding: have to find how the data is arranged in the program rather than looking at the code itself
    • Reconstruct the function from the decompiled C
    • Focus on decoding data, not code itself
    • Closer to a gigantic logic puzzle
2021-11-23 Marian  

Multi-Party Minimum Spanning Foress

  • Definition
    • Given weighted, undirected graph, find a set of edges with minimum weight
    • Vertices have to still be reachable if they were before
  • Kruskal’s Algorithm
  • Two-Party Computation
    • 1 owns x, 2 owns y
    • Jointly compute f(x,y)
    • Can’t learn more than what the function itself reveals
  • MPC Minimum Spanning Forests (MPC-MSFs)
    • Vertices public
    • Edge set split between two parties
  • General Techniques
    • Take any MSF algorithm and turn it into a circuit
      • Problem:
        • Need random memory access
        • Oblivious RAM (ORAM)
      • Solutions tailored to the MSF problem will be much faster
  • Semi-honest O(n)-rounds protocol
    • Simulate Kruskal’s algorithm and reveal each badge immediately
    • Each party finds its smallest edge, secret shares it and compares it to find the smallest one
    • Secure because the only thing a party learns are edges that belong to the MSF
  • MSFs are not necessarily unique
    • If no weight occurs more than once, then a graph’s MSF is unique
    • Other direction does not hold
    • Use canonical order of edges to break ties
  • More MSF properties:
    • Assume each weight is unique:
      • Then for each vertex, the smallest incident edge is always part of the MSF
  • Previous property yield Boruvka’s Algorithm:
    • For each vertex, find the smallest incident edge
    • Merge vertices for which an edge was selected
    • Repeat
    • Runtime is O(log(n)) rounds, O(n) minimum computed (n total)
    • Performs much better in high latency networks
  • O(log(n))-rounds semi-honest protocol
    • Same as O(n)-rounds version but this time with Boruvka’s algorithm
  • Random MSFs
    • All edges have the same weight
      • MSF=SF
      • Canonical ordering has high impact on the resulting MSF
    • Random MSF: choose random edge ordering
    • Alternatively: Choose random secondary weights
  • Simple solution
    • Jointly generate random secondary weights and reveal them
    • Now can execute any MSF protocol
    • Not secure, reveals too much information
    • Even when only the ordering of the own revealed edges is known
  • O(log^2(n))-rounds protocol
    • Saves time by reading/writing multiple entries in memory simultaneously
    • Use another MSF algorithm with O(log(n))-rounds
    • Doesn’t reveal values in the middle of the protocol
      • Maliciously secure
  • Mixed Protocol
    • Useful if the number of duplicate weights is small
    • Tries to reveal selected edges as early as possible
    • Switches to non-revealing protocol only when necessary
  • Summary
    • Unique edge weights -> O(log(n))-rounds
    • Few repeated weights -> Mixed protocol
    • Many repeated weights -> Protocols without intermediate data
2021-11-30 Navid  

Some new results on All-or-nothing Transforms and a request for help

  • t-All-or-nothing Transforms (AONT)
    • If you have less than t blocks of information, you can’t get any information/compute anything about the input blocks.
    • Entropy doesn’t change before or after you acquire the t blocks
  • Moving forward: X are the input blocks, Y the output blocks
  • An (N, k, v)-array is an N by k array, say A, whose entries are elements chosen from an alphabet T of order v.
    • We say A is unbiased with respect to a subset D if all elements in the alphabet that are in the subset D are as likely to appear.
    • A (t, s, v)-AONT is equivalent to a (v^s, 2s, v)-array that is unbiased with respect to a specific subset of columns.
  • Asymmetric, rectangular, range and strong, restricted are different types of AONTs.
  • If an AONT satisfies the entropy definition, then it has perfect security.
  • AONT: Weak Security:
    • We don’t need the entropy to remain the same, we’re happy if all possible t inputs have a non-zero probability.
  • If all inputs are equiprobable, then an AONT has perfect security.
  • Cases in between:
    • Combinatorial (t, s, v)-AONT
    • H(X knowing Y) = H(X’ knowing Y)
  • If inputs are mutually independent, if attacker only has s-t output blocks, entropy of t input blocks is the same.
  • What’s next?
    • Gathering information regarding usage cases
    • Currently thinking of a survey
2021-12-07 Nils  

Fooling Adversarial Watermark Transformer for Text Provenance Verification with Adversarial Examples

  • OpenAI GPT-3 Language model
    • Scaled parameters significantly -> improved accuracy greatly
    • Human baseline: 95% on LM dataset, GPT-3 getting close
    • Using model to generate code from sentences describing it
    • Scared of potential misuses of GPT-3
      • Large-scale online radicalization and recruitment
    • Model can be used as an API (too big to use locally)
    • Existence of content warning that can detect potentially dangerous topics
  • Text Provenance Verification
    • ~ Watermarking Text
    • Using synonyms
    • Hiding & revealing network
    • Rule-based methods or AWT (Adversarial Watermark Transformer)
  • Threat Model
    • Requirements
      • Efficacy
      • Secrecy: Indistinguishability
      • Robustness
    • Attacker Capabilities
      • Black-box API
      • No access to non-watermarked text
    • Attacker’s Goal
      • Preserve the service’s output utility, but remove the watermark
    • Evaluated Attacks
      • Random noise
      • Re-watermarking
      • Discriminator Training
  • AWT:
    • Language model loss
    • Semantic loss
    • Hiding & Revealing Network and a Discriminator
      • Discriminator used to discriminate watermarked from non-watermarked text
    • Can encode 4 bits per 80 words
    • There are some grammatical failure modes
    • Model is pretty diverse in how it replaces words
  • Attacks
    • Show attacks don’t remove the watermark
    • Only makes the sentence worse
  • Weakness in AWT
    • No inherent source of randomness during training
    • Pre-trained encoder
  • Breaking AWT’s robustness
    • Train a surrogate network, generate an adversarial attack. Use transferability. Not sure if it works yet.
  • Breaking Secrecy
    • Authors assume we cannot get watermarked, non-watermarked pairs
    • However, could sample repeatedly semantically similar passages of text and contrast differences between sentences to reverse-engineer the encoding
  • Breaking Efficacy
    • Generate input sentence blocks that appear benign, but are modified to a malignant sentence by AWT.
2021-12-07 Stan  

On Variety

  • Lots of bots and trolls on the internet
    • Fake human profile pictures
    • How to handle them?
      • Human token
        • Using government id
        • Paying some money?
      • Captchas
      • NFTs
        • Receipt of ownership on a public ledger
  • Private Reputation Supporting Ongoing Network Avatars
  • Fun
    • Related to novelty
    • Animals also seek fun
  • Covid-19 and boredom (in humans)
    • Difficult to study in a laboratory setting
    • Humans don’t get bored enough in natural setting to study either
    • Boredom researchers have been using the pandemic as an opportunity to gather data on what people actually do when confronted with boredom
  • Boredom and variety
    • Inconclusive connection between boredom and creativity
    • Boredom is a response to a context
  • There is value in spending time on variety and novelty in work/research
Topic revision: r1 - 2022-01-07 - LucasFenaux
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback