This is an archive of weekly meeting talks from 2021. See the
for the talks from the current year, and links to archives from other years.
Date |
Speaker |
Links from talk |
Summary |
2021-01-13 |
Cecylia |
Papers: |
TOR Bridge Distribution Tor Bridges - What is a Tor Bridge?
- Tor relays are public information and are therefore easy to block by censors
- Bridges help solve this. Brdiges are relays that aren't put in the consensus of TOR relays
- There are several types of Bridges:
- Default Bridges: Baked into Tor browser
- Private Bridges: Must be distributed manually
- Distributed Bridges: A database of bridges is managed by TOR and users can request info on a bridge via email, HTTPS, or Tor's moat
- We are assuming we have a way to distribute bridges to users
- Censors can also request this info using above mentioned methods
- The TOR project has a few mechanisms to slow censors from enumerating known bridges, but bridges are still getting blocked by censors quite quickly
- Potential Solution: reputation based bridge distribution system. If a bridge gets blocked, the user is less trustworthy. The longer a bridge is unblocked, the more trustworthy the user is.
- See the "links from talk" section for papers on existing schemes
Open Problems: - Sybil Defences:
- Salmon requires a facebook acount
- rBridge/Hyphae are invite-only
- How do we determine when a bridge is blocked?
- Self repoted?
- ooni?
- censored planet?
- Privacy:
- How do we prevent social graph data?
- Possibly future work w/ hyphae's anonymous credentials
|
2021-01-13 |
David |
Chipwhisperer: https://www.newae.com/chipwhisperer |
Discussed and demonstrated a side-channel attack on an Arm STM32 chip Chipwhisperer sells kits that can be used for attacks similar to those demonstrated by David. Kits come with everything you need are python is used to interface with them |
2021-01-20 |
Asokan |
Papers: in-toto |
Code Signing Developer has key pair, and signs code using private key and user verifies with public key Solarwinds Attack (2020) - System/Network management vendor
- Network management system (Orion) <- needs to be able to monitor all devices in a network, manage updates, etc
- Malware ended up getting signed
Software supply chains: - Software distribution infrastructure
- example: CI, Build, source repo, packaging
- In the case of solarwind, the attacker inserted malware into the build system or into the channel between the build system and build system
in-toto (Usenix SEC 2019) (See links for link to paper) Threat model - interpose
- compromise
- incomplete product
- counterfeit
Goals: - Security requirements
- Integrity of supply-chain layout
- Integrity of flows between steps
- Step authentication
- Functional requirements
- Generality and transparency
- graceful degradation of security
in-toto Design: Define a graph that links steps together They define a project owner which has a key pair The functionary is the entity that runs a step and also has its own key pairs Defines who is allowed to run each step When a functionary produces a step, link data is generated and sent to the next step with a list of materials produced Each step has a threshold to get a consensus (e.g. multiple repos) Example in-toto deployment: Debian reproducible builds 40%-100% space overhead Problems: - space overhead?
- Projecting Project manager?
- thresholding
- offline keys
- topology privacy?
|
2021-01-20 |
Simon |
Papers: Search pattern leakage in searchable encryption Access pattern disclosure on searchable encryption: Ramification, attack and mitigation |
New Symmetric Searchable Encryption scheme: Alice has a data set, and encrypts the db and uses a symmetric searchable encryption scheme The Query generates a hidden query, the files are leaked, but query is unknown Can potentially leak access pattern leakage, and common queries could get leaked New scheme: (Dp access patterns for searchable symmetric encryption) Add false positives to the database If queries �look don�t look the same but the access pattern is the same it can leak the query� Design goals: Encrypted db Random tokens Fresh access pattern randomness Performance metrics: - communication
- storage at client
- computation
Utility metrics: - true positive rate (close to 1)
- false positive rate (close to 0)
Privacy: Based on IPPE (Inner Product Predicate Encryption) Advisory can only compute the inner product of two ciphertexts, but nothing else Proposed scheme: OSSE: Obfuscated SSE Database and table, h: [n]-> [sizeof(h)] Encrypt polynomial and attach label Queries will encrypt query and attach label Advisory will return a document if one of the x�s is a 0 of an encrypted polynomial Polynomial generation: r_1 = (DOG + label + counter) A special root is used as a false positive Token Generation: Find documents with �Dog� for l from 1-> sizeof(h): for c from 0 -> C_max: x = (Dog + l + c) <- token/query + randomness for false negatives False positives: for id from 1-> n: x=(id + 0 + -1) + randomness of if we send query or not Non-matches: for l from 1-> sizeof(h) x = AAA + l + 0 DP analysis: Epsilon = 13 (quite large) but good enough to defeat attacks Communication overhead quite good Computational complexity comp < n(Cmx + 1) Client storage: very small burden Evaluation: Frequency Attack: Search pattern leakage in searchable encryption (See Paper links) As number of queries increases, leakage increases As false positive rate increases, leakage decreases IKK - Access pattern disclosure on searchable encryption: Ramification, attack and mitigation (See Paper Links) Accuracy lower than previous schemes Conclusions: Hiding search pattern is hard but effective against attacks OSSE: 1 comm round, no client storage, hides search pattern, better asymptotic comm then Oram Computationally expensive 30 min vs CLRZ 20ms |
2021-01-27 |
Matthew R |
|
Project H.O.N.K. Harmonized context detection framework Background: - Context detection is using phone sensors to gather data about the environment
- Trying to make a c.d.f. that will allow other applications to learn about the context the device is operating in
Explained using untitled goose game! Basic idea: Privacy: �How many people are around?� Proximity: �How close is my device?� Unfamiliarity: �How many strangers are around?� How do we calculate these values? Using device sensor data: - GPS
- Bluetooth
- wlan
- Accelerometer
Eventually: - Biometric Data
- Application usage
What is it used for? - Authentication
- Device Theft
- Device loss
How have you implemented it? There�s an android app, but testing during COVID is hard Currently testing using MDC Why MDC? |
2021-01-27 |
Urs |
|
Having Fun with COVID Alert or �How I have spent some of the evenings during the pandemic� COVID Alert: - Canada�s exposure notification app
- Smartphone app that broadcasts Bluetooth signals to phones in proximity
- When users test positive, nearby users will be notified of the exposure
- Based on Google/Apple api, built by some volunteers at Shopify, security eval by BlackBerry, and maintained by Canadian Digital service
- Open-source on GitHub
Fun Activity #1 - Look at the code: Uses React Native, makes the code really hard to follow
- Let�s focus on the upload part
- When user tests positive, app uploads to Health Canada the secret codes it used for generating the broadcast signals
- There is a traffic analysis threat
- Observing this connection reveals user�s info
Traffic analysis threat: - ISP, cellphone op, NSA, or Amazon can learn of your infection
- Notified CDS, Health Canada, OPC and IPC
- Fix based on dummy uploads promised but nothing has happened
- Germany, Switzerland, Italy, Belgium have this fix
- HC, OPC, and IPC released privacy assessments accompanying launch of COVID Alert but no mentions this vulnerability
Fun Activity #2 - Does anyone actually use COVID Alert for reporting infections?
- App downloads the secret codes uploaded by infected users to learn of exposures
- Let�s write a script that downloads these codes
- The codes use hmac256 and need a secret key
- COVID Alert GitHub repo doesn�t have the key
- Every app has the key
- Found the key, and it�s now on GitHub
Fun Activity #3 - How many infected people upload their secret codes?
- Every day the app creates new Temporary Exposure Key
- The app�s broadcast Bluetooth signals are derived from today�s TEK and changed every 15-20 mins
- Somebody downloading the TEKs will be able to link bluetooth signals derived from the same TEK
- Upon an infection, the TEKs for yesterday, two days ago,� 13 days ago are uploaded.
- Today�s TEK will be uploaded tomorrow
- More general: for the next 14 days, the previous day�s TEKs will be uploaded.
Going from TEKs to number of infected Users: - Each TEK comes with a validity period, typically one day
- At midnight, download the TEKs uploaded the previous day and try to find sequences of length 13
- Sequences may not be complete
- Incomplete sequences from entire day are garbled together and may result in fewer sequences
- Increase download frequency: every hour download the TEks uploaded so far today
- The difference between two subsequent downloads gives us the TEKs uploaded the previous hour
- Will miss sequences where there were no TEK for yesterday
- Will miss parts of a sequence after a gap
- Sequences of length 1 are either newly reported infections or follow-ups for an already reported infection
- #infections = sum of all sequences with length at least 3 (somewhat random choice)
- Will miss sequences with a gap the day before yesterday or two days ago
Urs� numbers are 10% below the ones reported irregularly by Health Canada 10% of people who have entered an OTK have not uploaded their codes Numbers now match Significantly fewer follow-up uploads than expected, users giving no consent? Spikes are not a good predictor for spike in overall infection numbers Fun Activity # 4 - The development of contact tracing apps started in April 2020 to assist manual contact tracing
- Unfortunately, there was no much discussion on why this was the most useful app to have given the limitation of smartphones and the propagation behaviour of the virus
- -Magic exposure condition of 2 meters for 15mins
- -2 meters is too simple, other factors like environment and air flow
- Bluetooth signal attenuation is not a good proxy for estimating distances
Main �discussion� was centralized vs decentralized architecture led mainly by security/privacy researchers - Privacy by Obscurity
- Decision about type and design of app was made by Google/Apple
- SO we ended up with not particularly useful but privacy protecting exposure notification apps
- Arguable some privacy is �privacy by obscurity�
- Android/iOS hide precise time and duration of exposure from alerted users
COVID Alert and Backward Tracing - Fall 2020: infection numbers started to rise again, importance of backward tracing emerged
- Can we use COVID Alert system for help with backward tracing?
- Easy when deploying our own clients on non-Google Play/iOS devices
- But most people use iOS or Android with Google Play
- iOS filters exposure notification Bluetooth beacons from any third party app so there is not much we can do in iOS
- Google with kick apps that specifically look for these beacons out of their App Store
Making COVID Alert more useful - Let�s write a Bluetooth scanner app that also records locations of sightings
- RaMBLE already does this!
- All we need is an app that processes RaMBLE�s recorded sightings tried to find matches in the recording and visualizes them
- Someone in Germany already did this!
- Urs also contributed to an OpenStreetMap
RaMBLE and Warn-App-Companion - If you use COVID Alert and want it to be more useful in case of an exposure
- Install RaMBLE and have it continuously record Bluetooth sightings
- In case of an alert from COVID Alert, install the Warn-App-Companion app and use it to learn the precise time and location of the exposure
- You can also periodically launch the Warn-App-Companion app to learn of exposures that did not satisfy the 15 mins and Bluetooth attenuation thresholds required by COVID Alert
|
2021-02-03 |
Masoumeh |
|
Privacy for {Correlated] Enterprise Communications Project Setup: Graph structure Email communications Publishing ngrams - Persevering information confidentiality - differential privacy
Assumption: - confidentiality captured in local communications (cliques)
General Goal: - Learn from enterprise communication without violating privacy
- Differential Privacy in Social Networks:
- Edge-level Privacy
Node-level Privacy Correlation in Data Analysis Conventional differential privacy: Participance of a record Real picture: Inferential Privacy Correlation in Ngram Analysis: Conventional differential privacy: Removing an edge (independent edges) Our work: Removing an edge and its correlated info Group privacy Correlation Models - Clique Captures local communications, but infinite sensitivity! Pufferfish Privacy (S,Q, O) - Secrets S, ngrams on the edges
- Indistinguishable pairs Q existence/absence of ngrams in an edge
- Data distributions set O captures correlation
The query F(X) The count of ngrams in the graph Uses Wasserstein Mechanism Gives an average correlation of edges Open problem: The more accurate the correlation model can be, the more private we can be Applying sensitivity reduction techniques from node-level privacy Ngram correlation Other correlation models Other neighbourhood models Application to old NLP tasks (model building, topic modelling,�) |
2021-02-03 |
Andre |
|
Automatic speaker verification Perform voice biometric auth: Two categories: Text-dependent: User auth using same enrolment phrase Text-independent: Arbitrary auth phrase spoken by the user Text-prompted: Arbitrary auth phrase chosen by the system ML based: MFCC, LPMS, CQTT Use cases: Access control Forensics Surveillance applications Problem: Spoofing attacks: Impersonation Text to speech Voice conversion Replay Spoofing countermeasures: Add an extra layer of security Classifiers that distinguish between spoofed samples and bonafide ASV + CM - The next big thing! Banks use it! Ultimate solution or a false sense of security? What does the literature tell us? Several attacks on ASVs Attacks on CMs ASV+CM? - Not a single attack! Impermeable? (No, just harder)
Threat model: Target system employs ASV+CM Attacker places call from a device they own Text-prompted: Replay attacks won�t work Impersonation is not a realistic thread Bottom line: The attack needs to provide a sample that simultaneously evades the CM and the ASV system The content is determined by the server, this has to be done in realtime Systems vary drastically in terms of architecture and features Adversarial examples: Novel inputs crafted with some knowledge of the attacked system The goal is to perturb an input sample such that: - Perturbation is imperceptible to a third party
- The perturbation causes the target model to misclassify
Adversarial examples have been shown to be �transferable� Challenges: The inputs to the two systems are two different sets of features Adversarial examples generators perturb the input features Perturbing the two sets simultaneously is not possible Adversarial perturbations are by nature �unstable� Paradigm shift: To craft adversarial examples that can bypass the two systems simultaneously We need to have the CM and ASV operate on the same input Idea: Incorporate the feature extraction stage into the attack pipeline: When attacking a white box system, add another layer that extracts the features in realtime and then passes those to the next stage as part of the pipe Ensure the model�s feedback can be propagated back to the first stage For gradient-based models, this corresponds to back propagation of the gradients Now that the two models accept the same input, introduce perturbations to jointly minimize two loss functions Implementation and models: Used code on top of Tensorflow and PyTortch Used compute Canada (2 T4 cards, 16G) Results: Baseline: Low attack success, 2.25% at most ASV, < 1% attack success Out attack: 24% < success rate < 43% What�s next? Speech-To_text test: To be considered a practical threat Human evaluation Future work: Execute an end-to-end realtime attack |
2021-02-10 |
Shannon |
Papers: Original Protocol: Attack 1: New Protocol: Attacking the new protocol: |
History of Authenticated Key Exchange Unauthenticated Key exchange - Explicit auth using digital sigs
- Implicit auth with keys as long-term crews
Done before with Diffie Hellman Can we do the same with (R)LWE based key exchange? LWE problem: Given (A,b) A is some matrix, b is As+e mod q. Find s RLWE Given (A,b) A is a polynomial ring, b = As + e mod q. Find s Attack 1: Find S_B p_a =?? Send p_A=0 w_B = Sig(p_as_B+2g_B) p_A=1 � p_A_2� s_B doesn�t change To find the sign, repeat process using P_A (1+x)k, k in {0, q-1} We now have s_B or -s_B Key exchange with reusable keys? Claim: New signal function doesn�t leak any information about the key Protocol: Same as Previsour except for computation of k_B and k_A (Ding, Branco, Schmitt (2019/665) Leakage of Signal function with reused keys in RLWE key exchange Attack 2: What�s new with w_B? It is constructed with As_B + known info + error Fixed with p_A is fixed Varies when Eve�s �Identity� changes Send p_a = k, k in {0, q-1} Collect signals when our known value is bounded by some h Repeat with p_A = (1+x)k, k in {0, q-1} The original attack took 3.8 hours, ours takes 1 min Attack 2 took 50 minutes |
2021-02-10 |
Gerogios |
|
Leader Based Consensus Protocols Blockchains can be broken into two groups: - Longest chain based
- Voting based <- Focusing on these protocols
Most Distributed Ledgers today are Blockchains Start at round 0, known as genesis block B^0 A leader is elected at round 1, proposed B^1 extending B^0 � B^1, B^2,� are eventually committed What differentiates DLTs? How participants/leaders are elected - Proof of work
- Proof of Stake
- Permissioned
How transactions are finalized -Blockchains (longest chain, voting) - DAG: Avalanche IOTA Participant election process PoW, PoS, Permissioned Consensus finality rules Nakamoto, classical, DAG Classical Consensus: Deterministic finality through voting Start at known genesis block B^0 A leader is elected at round 1, proposes new block extending old block Validators run Network model Synchrony (Nakamoto) - Once an honest participant sees a message, every other honest participant will see it within some time
Asynchrony - messages take arbitrarily long to propagate but are eventually delivered
Partial Synchrony (Classical) - Network alternates between synchrony and asynchrony
Formalization: Network is asynchronous UNKNOWN until Global Stabilization (GST), after GST it is synchronous State of the art protocols - Algorand
- Tendermint
- HotStuff
In general, use proof of stake, but elect leaders and voters differently Model: - Fixed number of N=3f+1 validators, with 2f+1 honest and f malicious
- Leaders are elected by rotating validators in a round-robin fashion
- Votes are aggregated through threshold BLS, so each validator receives O(1) aggregate votes per voting step
The big take away is that the protocols looked at are all very simple. They could be explained to a strong undergrad in about an hour |
2021-02-17 |
Oneeb |
Papers: |
Voice Assistants - Connected with smart devices (e.g. Amazon Alexa)
- Ubiquitous (more than 3.25 billion voice assistant users in 2019)
- Convenient to use
- Appeals to people because of no physical interaction
The problem: - Concerns over security
- Audio replay attacks
- Skill squatting (Register same name, Word Obfuscations)
- Access Control Vulnerabilities
Threat Model: - Access system functionalities
- �Hey Alexa, order this for me�
- Credential Theft
How are applications secured? - There is a manifest of components
- There should be access control checks
Problems with existing permission re-delegation detection: - hard to find these privilege code areas, Static Analysis and Dynamic Analysis
Static Analysis -> a lot of manual effort but little output Implementation is obfuscated Dynamic Analysis: - Run-time analysis
- Automation
- Many existing tools, but limited to GUI based testing
- Natural Language Processing might be required
Currently working on developing/extending a dynamic analysis tool that works for voice-assistant android applications Currently exploring Snowdrop (2017) |
2021-02-17 |
Vasisht |
|
What are the different privacy risks for machine learning mechanisms that use graphs Graph Embedding Algorithms Low dimensional representation for graphs Capture the original graph semantics Released for further computation (Recommendation systems) Graph Neural Networks State of the art performance Use embeddings for downstream tasks Link prediction node classification and clustering Graph learning Message Passing Algorithm Weighted aggregation of features from neighbouring nodes Graph NNs: Graph Data � Embedding � predictions Similar to convolution over neighbouring nodes Inductive training on different training and test graphs Privacy Attacks on Graph Learning: - Membership Inference Attack
- Graph Reconstruction Attack
- Attribute inference Attack
Membership Inference Attacks Overview: Embeddings and GNNs overfits on train data - Distinct members and non-members
Adversary exploits this to infer membership status Two attack scenarios: blackbox (predictions) and white box (embeddings) Blackbox Membership Inference Attacks Shadow Model attack - Attack preparation (train shadow model to mimic target model performance, train attack model to distinguish using prediction from shadow model)
- Attack execution (Send output predictions from target inputs to attack model)
- leak memership status with 15% on average inference advantage
Confidence Score Attack - Exploit: model more confident on seen training data than test data
- Attack: If maximum confidence across all classes > threshold => member
- Keak membership status with 28% on average inference advantage
Influence of Neighbouring Nodes Increase range for aggregating features from neighbours by increasing layers Attack and prediction accuracy decreases on increasing range - over-smoothening of features and reduces distinguishability
MIA influenced by structured data with preferential node connections Whitebox Membership Inference Attacks Attack Methodology - Attack Preparation (train encoder-decoder to get single value from embedding, cluster the membership values from the trained encoder)
- Attack Execution (Embeddings for target input is clustered as member or not)
38% advantage Graph Reconstruction Attack Attack: - Prior: Aux subgraph sampled from target graph distribution
- Attack Preparation (Train graph encoder-decoder on aux graph data_
- Attack Execution (Trained decoder to generate graph from released target embedding)
Link Inference (Infer a potentially sensitive connection between two target nodes) Were able to reconstruct the graph with high accuracy Attribute inference Attack Attack: - Prior (embeddings for some nodes and their attributes)
- Attack Intuition (Preferential node connection with similar attributes, Neighbouring nodes mapped to close embeddings)
Attack Prep - ML Attack Model: Embeddings -> sensitive attributes
Attack Execution - Attack model outputs attributes for target embeddings
Conclusion Releasing embeddings leaks sensitive information - Leak membership status of user�s node
- Reconstruct target graph structure given the embeddings
|
2021-02-24 |
Florian |
|
oins Over Encrypted Data (Practice Talk) Context: Cloud computing is here in a big way (revenue reaching $360 billion by 2022) Security Challenges: What were/are the greatest barriers for adoption? 55% Security concern! 54% Geographic (Also related to security from gov policies) �Can I trust my data with a cloud provider?� Adversaries in the cloud: Malicious insiders Criminals after high-pay off targets Nation state actors There is a solution to this! Encrypt you data, upload to the cloud, perform computation on encrypted data, and then download it (Homomorphic encryption, at least theoretically, is feasible for this) As it stands, Homomorphic encryption is not capable of performing database operations efficiently enough Database as a service: In the cloud, we have a Database management system Client issues queries over network This can be part of a data centre infrastructure Joins: Implementing secure joins is extremely difficult Not revealing the join pattern, you have to include everything (result table is n^2) Revealing the join pattern Reveals the frequency Detrimental in cryptanalysis Non-interactive: First attempt: (HILM02) Deterministically encrypt each value - No need to change join algorithm in DBMS
- Very fast
- Snapshot attacker can retrieve join pattern
Second attempt: (PRZB11) Onion-encrypt each deterministic encryption - Cold database does not leak any info
- joins on hot database very fast
- hot database leaks join pattern
Joins over encrypt data with fine granular security (IEEE ICDE 2019) A different way to onion wrap - Key-policy attribute-based encryption (KP-ABE)
- Encrypt data with attributes
- Generate key for specific attribute policy (selection values in the query)
- Decryption only works for cipher texts whose attributes match the policy
Conclusion of fine-granular security: KP ABE-encrypt deterministic encryption - Hot database only partially decrypted
- Joins on hot database still fast
- Series of queries leak more than their union
Can we remove the leakage? Yes! Qui-joins over encrypted data for series of queries (coming soon arx 2021) -Use inner-product encryption, such that the decryption results matches Inner-product encryption: - encrypt a vector of numeric plaintext
- Generate key for that vector
- Decryption results in one-way function of the inner product
Security claim: The leakage of a series of queries is the union of their join patterns: Proof idea: Consider a pair of decrypted dot products - Case analysis over (non-)equal query id/join value/selected attributes (8 cases)
2 examples: - Different query id: different keys k
- Selected attributes don�t match: random dot products
Assuming n dot products there are n(n-1)/2 potential collisions (birthday paradox) Performance results: Performance depends on the number of attributes and possible selection attribute values Encryption: 3.4ms to 9.6 ms (per update) Key get: 2ms (per query) Decryption 21.2 to 53 ms (per selected database tuple) - Hash-based join algorithm
Operations are all essentially linear! Conclusion of Series of Queries IPE-encrypt join value and query-dep key - Leaks only union of the join patterns
- Snap-shot attack can be unsuccessful
- Join on hot database can be slow
What can we do with a single round of interaction? Oblivious joins: Perform the same instructions and memory accesses for a join of tables of the same size n independent of the input data - Circuit computational model
Introduced as sovereign joins in 2006 (AAKL06) Best previous solution O(n log^2 n) runtime and O(log n) oblivious storage - Not implemented in practice
Sorting Network: Circuit to obliviously sort an array We will use it extensively as a building block Complexity O(n log n) in theory and O(n log^2 n) in practice First step: Merge the two tables and sort them on the join attribute Linearly scan the table and count the occurrences of each value in each table and augment with the count in the other table We can now compute the size of the result table in a linear scan Separate the two tables keep the count in the other table Second Step: Align the patterns using a formula based on the join count Conclusion of Oblivious joins: Single round of interaction can reduce the leakage to only the output size Hardware encryption does have comparable performance Final thoughts: You cannot have all three of the following: Non-interactive, single server Efficient No leakage |
2021-03-03 |
Setareh |
|
Constant-time Verification for Long-Lifetime Byzantine Fault Tolerant State Machine Replication Main Idea: Permission-based BFT - SMR protocols are commonly used for distributed ledges Shared Secret - Threshold signature -> constant time verification
An optimized version of MPSS Sending out the proposals: 1 Create a new polynomial, and value at 0 should be 0 2 Create a set of polynomials of the same degree for the members in the current epoch 3. Generate a new set of encrypted values such that each individual can only decrypt their values Proposals are sent to aggregate and individuals check if they agree Resharing: New group perspective: Reshare - Receipt messages Compute new shares P(x) + Q(x) + R_k(x) Share recovery - Broadcasts a share-recovery
Security Evaluation: Safety: If two statements x and x� are signed, with sequence number j, the x = x� - true for first epoch and any epoch after
Liveness: A properly-issued requests will eventually complete - Normal requests
- Membership change requests
The implementation is built as a Rust library |
2021-03-10 |
Faezeh |
|
Data sharing: Datasets are often released for researchers to use How do we prevent linkage attacks while still providing meaningful datasets? Traditional Methods: - Strong assumptions on the knowledge and identification strategies of the attacker
- Evaluating by simulating the attacker based on assumptions
Netflix prize dataset (combined with imbd dataset leaked users info) Differential Privacy: An output of an algorithm should not be that different if an entry is removed or added Differentially Private Synthetic Data: Team DPSyn - Second in all three competitions run by NIST
Pre-processing: - Clustering the variables so that similar variables are in the same cluster
- Construct 1-,2-,3-way marginals between all variables in each cluster
- Perturb the counts using Gaussian Mechanism
Post-processing: - Using some techniques to check the consistency among the marginals
- Sampling form the noisy marginals and generate synthetic data
Team Gardn999 - fifth and fourth in competition 2 and 3
DPFieldGroups Algorithm Pre-processing: - Clustering the highly correlated variables
- Perturb the counts using Laplace Mechanism
Post-processing - Decreasing the noisy counts to zero if they are below a threshold
Sampling from the noisy histograms with weighted probability and generate the synthetic data Team PrivBayes: Fifth place in competition 1 and third in 2 and 3 Algorithm: - Calcuate the level of correlation between each pair of attributes and save them as score for each pair
- Perturb the scores using Gaussian Mechanism
- Use noisy scores to create Bayesion network
- Approximate the distribution of the original data
- Perturb the P marginals via Gaussian Mechanism
- Use the noisy marginals with the Bayesian network
- generates the synthetic data by sampling tuples
Team RMcKenna - third in competition 1, fourth in 2, and first in 3
- a blend on parametric and non-parametric approach
Algorithm: - Identify the highly correlated variables on a public data set
- Perturb the 1-,2-,3-way marginals via Gaussian Mechanism
- Use graphical models to determine a model for the data distribution
Team UCLANESL - fourth place in Math 1, fifth in 3
Uses Wasserstein generative adversarial network Algorithm: - clip the discriminator gradient updates to ensure bounded l_2 sensitivity
- add noise from the Gaussian Mechanism to the gradient updates of the discriminator
- The moment Account technique will about the WGAN training if the privacy budget has been reached
Comparing parametric and non-parametric approaches: Non-parametric: - similar functions in different algorithms
- much less computational complexity
- more pre-processing work
- assume access to accurate public data
Parametric Approaches: - various functions in different algorithms
- more computation complexity
- less pre-processing work (in some algorithms)
|
2021-03-17 |
Akshaya |
|
ZXAD: Zero-knowledge Exit Abuse Detection for Tor Tor Exit Abuse: - Traffic exits through publicly known exit nodes
- Attackers use Tor to attack services, which causes services to block all Tor traffic
Circuit-level Attacks: - Client tries to send lots of streams through a single circuit
- Exit can rate-limit the number of streams per circuit
Exit-level Attacks: - Client sends multiple streams via multiple circuits
- Traffic from different circuits are unsinkable
Tor-level Attacks: - Client uses multiple circuits and multiple exits
- Exits do not have a global view of the network
ZXAD high level: - Client is allowed to send a specific number of tokens to a server
- Limits the number of unsinkable connections to a destination
- Exit�s can mitigate Tor-level attacks by giving the server a hash of the token to differentiate different users
Unlinkability: ZXAD relies on the existence of a unique ID for every client Websites use IP addresses to block abusers outside of Tor - IP Addresses are neither permanent nor unique
ZXAD is general enough to be adapted to any unique ID - even ones with low entropy
(We are assuming client has a unique ID) Participants and Threat Model: - Clients are malicious
- Majority of DirAuths are honest
- Exits and servers can collude with malicious DirAuths
Long term keys are generated and given to DirAuth for signing keys Periodic signing keys are used for the client Circuit Token Showing and Verification Client sends the circuit token and the proof along with the first stream token in a circuit Circuit Token creation is expensive, but can be done offline Stream Token Showing and Verification Client sends the stream token for every new destination within a circuit Performance ~8.55ms ib client-side computation 31 sections on 8 cores of exit-side computation per 10-minute epoch for busiest node |
2021-03-17 |
Zeinab |
|
A Probabilistic Approach to Recommending Access Control Checks in the Android Framework Main focus is on the Android Framework Layer Framework Layer: App developers use this layer to access services (such as gps) API Permission Requirements Many APIs require the caller to hold a set of permissions and pass UID/user checks Access Control Inconsistency Say two APIs want to get access to a Sink One API requires root, the other doesn�t. An attacker would just use the lower permission level There is no regulation of framework-level customization Tools to detect these inconsistencies Kratos: Not path sensitive, does not model user checks Lots of false positives since it detects syntactically inconsistent permission enforcement AceDroid: Can conduct cross-image inconsistency analysis Path sensitive Considers a larger number of security checks Normalizes security checks ACMiner If one path leads to a Security Exception, another will lead to a protected operation - Looks at conditional statements on paths leading to a SecurityException to identify potential authorization checks
- Domain expert filters through list of potential authorization checks
- Performs consistency analysis by uncovering relations between sets of auth checks
Not practical, need to define auth checks The Problem with Inconsistency Detection: - May exist for a legitimate reason
- e.g. An API that doesn�t do perm checks before using operation, whereas another already does
A New Approach: - Confidence Level: Resource X should require the following protection
Constraint Collector: - Collect hints from the codebase that suggest a 1:1 resource to protection mapping
- Turn these hints into prior-probability constraints
- Collect hints that denote relations between resources
- Turn these hints into implication constraints
Probabilistic Inference Engine: - Transform constraints into factor graph and perform belief propagation
- Output posterior probability
Considerations: - What happens if different constraints contradict one another?
- What happens if a resource is accessed only in one API?
- What should a recommendation be made?
|
2021-03-24 |
Peyman |
Papers: Time Capsule |
Front-running in Blockchains Front-running is when someone gets a benefit from early access to market information about trading Typically because of a privileged position In Blockchain, everyone has access to Privileged Info Miners are in a more privileged position Miners can be bribed by transaction fee/ gasPrice Attack types: Displacement - Not important to the adversary for original function call to run after her function Insertion - Important to the adversary for original function call to run after her function Suppression - Run function and delay original function call Mitigations: 1) Transaction Sequencing - FIFO (Not possible on a distributed network)
- Trusted third party
- Pseudorandom Sorting
- Transactions themselves
2) Confidentiality Components of a transaction - Code of DApp
- Current state of DApp
- Name of function being invoked
- Parameters supplied to function
- Address of the contract the function is being invoked on
- Identity of the sender
Commit/Reveal Strategy 1) Commit Hash or Encryption of transaction 2) Reveal transaction details after finalization of block Cons: Requires 3 transactions for each order Other approaches: Time Capsule [Doweck, Eyal 2020] Identity-Based Encryption |
2021-03-24 |
Jason |
|
Talked about an event he ran with the goal of getting high school students interested in computer security - Simulated a CTF to try to keep people engaged
- Participants had next to no technical background
- Challenges mostly focused on information gathering and using that information to break things
- Overall event went well, and several participants attended future events
|
2021-03-31 |
Xinda |
|
Certified Robust Word Embeddings for Textual Classification via Differential Privacy Textual Adversarial Attacks: Example 1: Character substitution Character insertion/deletion Textual Adversarial Attacks Example 2: Multiple character-level changes on a sequence Word substitution NLP Background (BERT) Pre-training and Fine Tuning Word Embeddings Looking at Character BERT - More robust than BERT, but still susceptible to adversarial attacks
- instead of looking at words, look at each character
Adversarial Robustness Definition A small change in the input does not alter the scores so much as change the predicted label Certified Robustness via DP in Image Classification - PixelDP: Let A be a randomized scoring function that takes an input x and outputs a response
We say A is epsilon-delta pixel dp Certified Robustness via DP in Image Classification - Bound the l2-norm sensitivity of pre-noise layers
- Add calibrated noises during prediction (and training)
- Test the condition
Challenges in NLP Setting - P-norm cannot be used here. Need a new distance metric to measure perturbations
- Bounding sensitivity of g(.) with regards to the new distance metric is non-trivial
Related Work Certified Robustness of Word Substitution - Bound the sensitivity for a pre-defined adversarial set
Limitation: - Only for word substitution attack
- Adversary set is a set of words selected in advance
- The bound may be loose
Levenshtein distance - How to define meaningful perturbations?
- Distance Metric
� Minimum Editing Distance � lev(�cat�, �bat�) = 1 � lev(�dogs�, �dg�) = 2 Our Solution 1 (Posterior Privacy) - Levenshttein Ball (Center C, radius r): represents all points in ball
- Levenshtein Shell (C, r)
- Determine the maximum perturbation radius r_0
- Randomly draw r with a probability distribution
- Uniformly sample a sequence from the Levenshtein shell
Solution 2 (Distance-aware Mapping) - An ideal mapping maps an input string to a high dimensional space
- We leverage some basis points and the triangle inequality to achieve Lipschitz continuity
- Exponentially complexity Bound of # of anchor points
- Hamming distance
|
2021-03-31 |
Shelly |
|
Real-time Attacks Against Deep Reinforcement Learning Policies Reinforcement Learning Process - Markov decision process: described by state, action, reward, and environment at each time step
- Goal: Find optimal policy that maximizes the cumulative reward/return
Adversarial Examples - Fool classification model into misclassifying certain samples
- Given an input x we want to find a modified version x* such that x and x* are similar but f(x) = f(x*)
Focusing on attacks that perturb states by adversary Motivation - Real-time Attack Current state-of-the-art attacksL Perturbations are computed using black propagation or solving optimization problems iteratively at each time step (too slow!) Adversary Model - Untargeted, White-box Adversary attack goal: - Effective attack: lower victim agent�s return
- Efficient attack: can be realized in real-time
- Evades known defences
Adversary Capabilities: - Knows the agent�s policies
- Adversary is capable of perturbing states input to the agent
Proposed Attack - UAP-O and UAP-S Use precomputed universal adversarial perturbation mask for attack - The adversary generates adversarial perturbation mask offline
- The adversary only apply the perturbation mask during test time. No other computation is done during test time
Two new attacks: - UAP-S: state-agnostic universal adversarial perturbation
- UAP-O: observation-agnotistic universal adversarial perturbation
Attack Design: To generate UAP-S and UAP-O 1) Adversary first collects a training set by monitoring the environment for one episode 2) Adversary sanitizes the training set by choosing only the critical states by using a relative action preference function 3) Generate UAP-S and UAP-O using victim policy weights Evaluation - Efficiency - While FGSM is the most effective attack, it is too slow to be applied in real-time
- The other attack obs-seq-fgsm-wb, while similar to our approach of universal adversarial perturbation, requires the agent to pause the env mid-episode to compute an adversarial mask for the entire game
- Obs-seq-fgsm-wb(U) is out modified version of the attack where the adversarial mask is generated offline instead of mid-game
Evaluation -Effectiveness - FGSM is the most effective attack in all games, but cannot be mounted in real-time
- UAP-S performs better or comparable to obs-seq-fgsm-wb and obs-seq-fgsm-wb
Future work: Defense Methods: - Detect when an agent is under attack
- Mitigation - Mitigate the damage
Weaker Adversary Model: - Explore possible attack where the adversary only has black-box access
|
2021-04-07 |
Ezz |
|
Variational AutoEncoders Phones have a lot of sensors Implicit and Continuous Auth When not using phone it is still capturing a lot of data Use this to keep people logged i Want to be resilient to malicious actors from abusin Use VAE to detect outliers and let someone else handle the response |
2021-04-14 |
Emily |
|
Efficient Differential Privacy Mechanisms with High Utility for Prio What is Prio? - Use by Mozilla to collect user stats
- Aggregate statistics from users using MPC
Overview: - Clients secret share their data with servers, along with a proof that their data is valid
- Servers publish an aggregation of the valid data
Users send data and secret-shared non-interactive proof (SNIP Servers validate SNIP before adding data to their local aggregate SNIP provides robustness against malicious clients Prio Ensures privacy of clients assuming on honest server Does not provide robustness Problem: Reconstruction attacks can be used to recover individuals� data from aggregate statistics Adding dp would protect against such attacks Naive Constructions: Local DP: - Each client adds noise to their data before sending it to servers
Pro: Clients don�t need to trust a central party Cons: Clients need to add the correct amount of noise Distributed Noise Generation Between Servers Achieves (epsilon, delta)-dp Noise generated and added to the result by servers - each server generates N/k bits
- Secrete share these bits with all other servers
Threat Model: - Data curator: honest but curious
- Clients: assume the proportion of misbehaving clients is below some acceptable bound
- Servers: trusted to follow protocol
Ideal construction: - Achieves high utility like the central DP model
- Efficient
DPrio 1) Eeach client generates Lapacian noise and secret shares it with the servers 2) Servers collaborate to randomly choose a particular client�s noise 3) Servers aggregate data from all clients and add recreate share of noise 4) Publish result to analyst Server Noise Selection: Servers use a commit-reveal protocol to select a client�s noise - Server select a random number
- Commit: Each server computes and publishes the hash of their random number
- Reveal The servers open their hashes
- Honest servers verify that each server correctly published the has
- The servers sum all the revealed random numbers mod N
Privacy: theoretically achieves espilon-DP - Relies on the security of secret sharing mechanism
Robustness - a set of malicious clients cannot influence the final aggregate beyond their ability to choose arbitrary valid inputs
- SNIPs ensures that all servers are honest to verify values
- can only verify that input is a valid noise value, not that it was sampled correctly
Client-DPrio - All clients sample noise from a normal distribution, adjusted based on assumed number of malicious clients
- Servers sum all noise from all clients which results in DP Gaussian noise
Privacy: - Achieves (epsilon, delta)-dp
Malicious clients may submit noise Server-DPrio Privacy: Achieves (epsilon, delta)-DP if at least one server is honest Robustness: - Robust against adversarial clients when all servers are honest (same as Prio)
- Don�t need to worry about noise added from malicious clients
|
2021-04-21 |
Justin |
|
Updates in the World of Signal All the Numbers are US: Large-scale Abuse of Contract Discovery in Mobile Messengers: - Signal uses phone numbers as the identifiers
- Privacy concerns: Have to upload all of your contacts to signal
- Use hashes before uploading to servers!
- Phone numbers have low entropy and thus don�t give you a lot.
- You can just generate a database of every phone number for a country, it only takes 13 hours and 3.3TB of storage
- Authors use a rainbow table and construction a new reduction function
- Authors found that Signal was rate limiting of 120k contacts per day
Solution: - Have a big database of everyone, and a small database of new users
- Rate limit the big database with a small limit
- Rate limit the big database with a large limit
Signal integrating payments in app - Integrating �MobileCoin�
- takes place inside intel sqx enclaves (Not known for its security)
- Built on Stellar
Takeaways: - Stop trying to make hashing low-entropy identifiers a thing
- Politicians secretly love signal
- Someone should probably figure out private set intersection
- MobileCoin exists; It�s probably fine, I don�t know
- The Internet doesn�t understand software licenses
|
2021-05-12 |
Ian M |
|
Usual strategies are failing us when it comes to COVID Thread landscape: Fake ventilators Salvaged equipment with non-OEM software Open source medical devices Supply chain security problems Numerous problems with COVID test kits EU MDR delayed introduction because of COVID An intriguing security spec in EU MDR: Software must demonstrate risk as low as possible Post market surveillance to be required for software used in medical devices Billing fraud Debunked - hospitals aren�t recording COVID as a cause of death to get money The payment amounts are correct but there is no evidence of upcoming E-distribution of false and misleading information/advertising 667 investigations launched in canada so far Money laundering Significant fall off of casino cash Payment claim fraud In the US: Paycheck Protection Program (120 defendants so far) Economic Industry Disaster Loans UI fraud In Canada: Canada Emergency Wage Subsidy - aimed at employers, No SIN check, no viability check CERB - aimed at employees, supposed to get EI or CERB not both (expecting to take 4 years to sort out) Using websites to sell nonexistent PPE or PPE at inflated prioces The synthetic identity problem: A collection of digital artifacts that purport to represent a real person, but there is no real person Valuable for criminals to use for financial fraud A twist on synthetic identity fraud: Ontario�s Support for Families program was abused - 400 bank accounts opened at the same institution in a month
- Ontario benefit applications were submitted by holders of these accounts
Government under pressure to allow more than one payment to a family, so rule changed to allow >5 payments to same application - 10,000 payments made to fake participants
How did they do this? - Mallory was an insider
- Mallory was senior IT leader responsible for the requirements specs for the benefit application program
- Mallory was also the person who recommended the �fix� to allow >5 payments per family
- Mallory was married to the provincial government IT manager responsible for disbursing funds
- Smart enough to wipe the government issued phones, but not remove all digital footprints
- Had outside �help�
In order to get speed and simplicity in getting help to people, caused it to be too easy to abuse |
2021-05-12 |
Nils |
|
SoK: How Robust is Deep Neural Network Watermarking Provenance Verification Training a DNN is expensive A single entity provides a DNN that is consumed by many Model Stealing is a threat Ability to detect stolen models is a powerful deterrent DNN Watermarking Watermarking is a method for provenance verification of DNNs Embeds a message into a model that is extractable using a secret key Robustness An attacker cannot derive surrogate models from access to the source model that do not retain the watermark Watermark Retention: Success rate for matching bits of the embedded and extracted message Problem - Robustness has only been studied in isolation against a relatively small set of attacks
- No systematical, empirical of robustness claims
- Uncertainty about a scheme�s robustness causes difficulties to trust their deployment
Watermarking Scheme Categories: Model Dependent requires access to the model Model independent No access to the model requires During inference (active) Requires control over the source model�s deployment Removal Attack Categories Input Preprocessing Model Modification Model Extraction Adversary Model Goals Derive a surogate model - without retained watermark
- That is well-trained
- And reduce resources needed for the attack
Decision Threshold - Measure the watermarking accuracy as the success rate between the extracted and embedded message
- - What is the lowest watermark accuracy to verify a stolen model?
- Surprisingly, a method to derive this decision threshold is missing from all surveyed papers
- It is difficult to determine, because we do not know the watermark accuracy of unmarked models
- We propose a generic method to estimate a scheme�s decision threshold
Results: A lot of attacks break the watermarking schemes. Is there one attack that breaks all watermarks? There is a dominant attack! Computational Resources, restricting computational resources is not realistic. 100h GPU time costs about $43 USD Dataset Availability, Plays a significant role for the attacker Availability of Pre-Trained Models, Platforms for sharing, pre-trained models for free |
2021-05-19 |
Stan |
|
Reputation (or, Why we did the things we did) What is a reputation system? A rating system (seller reviews, reddit�.) - Collecting and combining opinions on some thing
- The belief a person may have about what some group of people believe about another individual
- A technology long used by human society to organize how people interaction with each other
Reputation is used to incentivize good behaviour Problems: - People on the internet can be mean
- We can try to make people less rude, even while letting be �anonymous� (previous work, anon-rep)
- We can do that in ways that are harder to �cheat� and here are some concrete ideas of what �anonymous� actually means
Covid-19 and the natural experiment - novel virus with new info constantly being discovered
- wide impact with lots of people having reason to discuss it
- It is desired that discussion about it is generally accurate, at least as of the time a post was made
Methods of enforcing good information for Covid-19 - reputation (as a threshold)
- account age (as a threshold)
- link to sources
- Admin�s �secret sauce� to combat bots
Reputation was insufficient alone to encourage good behaviour - Reddit style reputation did not work well
Other reputation functions are possible though! First principles of reputation - How much do people trust this individual?
- How much do they do so �now�?
- Are they at least not a troll?
First ideas: - Ask everyone every time I need to know
- Have everyone put their opinions on some kind of bulletin board, and use their most recent posts
How we arrive at PRSONA and AnonRep - semi-trusted servers can make sure scores follow users despite pseudonym changes
- They don�t have to be able to track users themselves to do this
- AnonRep uses reddit-style reputation (which makes the problem simpler)
- PRSONA uses a different form (where votes can always be updated, and each user only gets one)
Still interesting unanswered question: - How do users actually interpret reputation scores?
|
2021-05-19 |
Ted |
Links: |
Password Authenticated Key Exchanges (PAKEs) and the �Quantum Annoying� Property PAKEs: - to do a key exchange, and rather than using PKI, use passwords
Typical method: - Use TLS to establish a secure connection, send password in the clear, server checks the hashed password
Problems: - No way for a client to know that server is handling the password properly: are they really only holding? Are they properly deleting the cleartext after login?
- Is the extra round necessary? (Authenticate server, establish key, then authenticate client)
The solution: integrate the password check into the key exchange itself - less communication
- don�t send cleartext password
- have some assurance that server is handling passwords in a reasonable way
Why aren�t PAKEs everywhere? - Patentns (just expired)
- Lack of standards (IETF draft available now)
- Doesn�t fit into current internet infrastructure
Two styles of PAKEs: balanced and augmented Balanced: - Both parties share the same password and mutually authenticate each other
Augmented: - Client has password, server has F(password) where F is a one-way function
We are focusing on balanced PAKEs What is a secure PAKE? - Offline dictionary attacks should never be feasible
- F(pwd) is a function that anyone can compute can never be sent to the other party
- Forward secrecy: If pwd is later compromised, old messages can�t be decrypted
- It is unavoidable that each online interaction Eve does gives her one guess at the password
CPace Security: - Use Diffie-Hellman as a base point
- Eve sees a group element U and knows that U = hash(pwd)^u
- But she doesn�t know u
- Even knows that K = Hash(pwd)^(uv) = U^v = V^u
2019 CFRG asks for candidate PAKEs for use in IETF protocols, One balanced, one augmented 2020: CPace announced as the balanced winner, OPAQUE as the augmented winner Considerations in deciding winners: aspects of security proofs, flexibility, efficiency of protocols,� Can you compromise CPace with a quantum computer? You don�t have enough info to solve the discrete log problem as the generator is never sent Solving an entire discrete log problem for each guess of the password Estimates on how long a maturate quantum computer would take to solve a discrete log range from seconds to minutes or hours The above property was dubbed �Quantum Annoying� the only feasible offline dictionary attack is one that requires a discrete log solution for each guess of the password Extend with a discrete log oracle Generic Group Model to prove Quantum Annoying-ness |
2021-05-26 |
IanG |
|
Private Blocklist Lookups with Checklist Blocklists are used by browsers to deter users from accessing malicious websites Checks a site against two lists: - certificate revocation list
- safe browsing list
Problem: Send the entire block list to the client: - the block list is huge though (~93 Mb)
Browser pings server every time to look in the block list: - The server learns about every site you visit
A filter is constructed such that if it is a negative it is 100% negative, but if if allows for false positives Only do a online lookup when the local filter misses, but this has been shown to leak significant information Private Information Retrieval: Alice wants to look something up on a database server The database responds and has no idea which record was queried 2-server PIR Two servers that have identical copies of the database and don�t collude - Alice sends a query to each database
- Each database independently computes a responses and replies
- Alice combines the responses to get the record she cares about
Offline-Online PIR Server has to linear amount of work, but the server can do a lot of it up front rather than in real time - Prior to a query, Alice can download a �Hint� <- Takes a linear amount of time. Hint is much smaller than actual database
- Alice can use hint to construct a PIR query that takes the database less time to answer
Does Alice have to get a new hint? What happens if the database gets updated after Alice gets the hint? - Currently after every query, and/or after every database update Alice needs to get a new hint
Puncturable sets: paper: https://eprint.iacr.org/2021/345 New abilities: GenWith(X) generates a key that when evaluated generates a set of sqrt(n) elements that contains X, but X is not distinguishable from the other elements. Puncture(X) generates a set that had X removed from the set constructed by GenWith Hint generation: Runs Eval on the key, take all the records that correspond to the elements of the generated set and xOr them together. Server does this lambda * sqrt(n) times <- every record should appear in one of these sets What do queries look like? Alice sends a (different) punctured key, and a (different) random record from the set to both servers Server sends back the xOr of all the records associated with the punctured key, as well as the record that was randomly selected Uses the hint from the server you aren't querying to construct a query, use genwith to construct the other Puncture both Alice updates her hint by using the response to the query generated with GenWith |
2021-06-02 |
David |
|
Supersingular Isogeny Key Encapsulation NIST PQC: - US National Institute of Stanrds and Technology
- Standardization process for post-quantum public-key crypto schemes
- Several rounds to determine which scheme to standardize
SIKE (Supersingular Isogeny Key Encapsulation) - IND-CCA2 KEM
- Based on Supersingular Isogeny Diffie-Hellman
- Uses Hofheinz et al. Transformation on SIDH to achieve CCA security
Changes for SIKE in third round: - Faster implementations of compressed variants
- Hardware implementations for all parameter sets
- Optimized ARM Cortex M4 implementations
- Updated security analysis
SIKE has by far smallest public key size - There is a compressed form, but takes additional computation
Baseline security: O(fourth_root(p)) The case for SIKE: - Small key size
- Straightforward relationship between parameter size and security
- No reduction in concrete security since inception (10 years!)
- Perfectly correct: no errors, reconciliation, or decryption failures
|
2021-06-02 |
Jiaya |
|
Multi-stage Adaptive Authentication and Access Control What is Adaptive Authentication? - An adaptive authentication system is able to automatically modify its behaviour and/or structure in response to changes in its operating environment.
Whether to authenticate: - Reduce the frequency of explicit authentication
How to authenticate: - Choose the best authentication mechanism
When to (re-)authenticate: - Continuous Authentication
- Progressive Authentication
Adaptation Goals: Security - Risk-awareness
- Low false acceptance rate
- Responsiveness
Usability - Efficiency
- Low false rejection rate
- Low power consumption
Adaptation Logic: Condition - Contextual factors: location, time, etc
Adaptation - Structural: de-(activate) one or several authenticators
- Parametric: modify parameters and/or change behaviours
Issue 1: All-or-nothing access control Unlocked: full access Locked: no access/explicit authentication required Issue 2: Hierarchical adaption - Multi-modal continuous authentication
Multi-stage Adaptive Authentication Organize adaptation policies in different stages - Higher uncertainty of identity, stricter access control
|
2021-06-09 |
Cecylia |
|
Network Performance of P2P Proxies Proxy servers: - Software is run on servers somewhere (usually digital ocean, aws, greenhost�)
- Acts are a redirection point, middle man for traffic
Snowflake: - Users running proxies on home networks or mobile phones
- Share bandwidth with users that are being censored
- No port forwarding!
The Internet is optimized for client-server model - For �user� networks (mobile networks, home networks, etc)
- Download speeds much higher than upload speeds
- VPS speeds:
- Up/down: 1 GB/s
Throughput: limited by throughput of lowest throughput link Sometimes we want more reliability than what reliable network protocols provide If a relay drops, the user just connects to another TCP meltdown problem: - If you build a reliable protocol on top of tcp, duplicate information can be sent, and more packets are being dropped and spirals out of control
Questions: - Is there congestion at the proxy or endpoints?
- Can we tune or choose reliable protocols to perform better in these conditions?
- How can we split traffic between multiple proxy peers?
- Can we assign proxies to clients depending on geographic location?
|
2021-06-16 |
Bailey |
|
Caring about Sharing Trying to go beyond Two-Party Advertising - Private data is shared with advertisers
- Example: Tim Hortons was sharing data with a third party as well as its parent company
Tech companies partners with hospitals Secure Computation: - Private Set Intersection
- Private Machine Learning
- Differential Privacy
Law and Policy Privacy policies have extremely broad terms: �Partners�, �third-parties�, �affiliates� How does the overall acceptability vary across different types of multiparty collaborations? How does acceptability vary in multiparty data collaboration for different user controls? Survey was conducted - 916 participants
- Each user receives 1 of 12 scenarios and a series of user controls and privacy mechanisms
Types of Multiparty Data Collaboration - Validation
- Two-Way Two-Party Exchange
- One-Way Two-Party Exchange
- Acquisition
- Merger then acquisition
- Many-to-one Exchange
Results: - Concealed consent received the highest proportion of �Completely Unacceptable� answers
- Opt-in consent had highest proportion of �Completely Acceptable�
Big take aways from free-form responses: - Some people said it was fine and needed for advertising
- A lot of people hated it, said it often felt like entrapment
- Some said it is what it is
- Biggest thing is data collection should require consent
Take-aways: - Disambiguate Third Parties
- Explicit over Implicit Consent
- Communicating Privacy
|
2021-06-16 |
Yousra |
|
Dissecting Residual APIs in Custom Android ROMs Residual APIs - Residuals may lead to the following:
- Increase code complexity
- Broaden attack surface
- Android�s ecosystem, the security implications of Residuals are not well studied
- This project bridges the gap by performing a large-scale security investigation of Residuals in Android
Residual API is a private API that is not used on a particular device but is used in earlier versions Model Residual APIs - Originally intended for use in specific models with certain functional requirements
- Yet are still defined in other models without those functional requirements
Security Implications of Residuals: - As residuals are deemed unnecessary/ unfunctional, they may be naturally overlooked during integration and version updates
ReM - Analysis techniques that detect and evaluate the risks of custom Residuals Detecting Residual Use Points: - Collect public entry points leading to the API
Detecting Evolution-induced Access Control Errors Unsound Security Features - Locate Undefined custom permissions
- Locate package name checks
- Locate references to Deprecated Access Control Features
Obsolete Access Control Enforcement - Perform a highly-localized inconsistency analysis
Found that most flaws existed in residual apis vs non-residuals |
2021-06-23 |
Thomas |
|
Is Differential Privacy the Right Defence against Membership Inference Attacks? Membership Inference: Given a model (Target Model), and a datapoint, was the datapoint used to train the model? Membership Experiment: Adv = 2*Pr(Adversary is correct) - 1 = TPR - FPR Standard ML Models are Vulnerable to Membership Inference Attacks Differential privacy allows certain bounds to be proven The bounds are quite loose! Contributions - Investigate prior membership experiments and provide a tighter bound
- Construct a generalize membership experiment
- Evaluate the performance of off-the-shelf MIA
- Bias
Tighter bound: Adv <= e^eps - 1 + 2 delta / e^epsilon + 1 |
2021-06-23 |
Rasoul |
|
S.E.A.L. Simple Encrypted Arithmetic Library - Homomorphic Encryption Library
- Developed by Microsoft
Homomorphic Encrypted data can have computation performed on it S.E.A.L. Supports two encodings: - Polynomial Encoding
- Batch Encoding (Allows for SIMD operations)
Noise Budget: - Each ciphertext has a noise budget
- As operations are performed, more noise is introduced
- Determined by N and q
Modulus switching: - A chain of modulus primes is used to reduce noise after operations
- When should you use modulus switching? It�s not obvious
- Compilers can optimize for this
|
2021-06-30 |
Hans |
|
PACStack an Authenticated Call Stack Control-flow attacks: - Exploit memory error to inject shell code into writable memory (usually stack)
- Corrupts return address to redirect execution flow
This can be prevented with modern defences Return address corruption still prevalent - Code reuse attacks (Return orientated programming)
Arm Pointer Authentication - Hardware implementation that ensures pointers are not maliciously replaced by using MACs
- Susceptible to reuse attacks
Reuse attacks: - Swap pointers around to change execution flow to technically valid, but unintended location
PACStack: High-level Generate an authentication token using Pointer Authentication that depends on the previous token and the return address being protected Forms a chain of projection making it almost impossible to reuse pointers Collisions can still happen though and allow reuse of PAC chain How do we prevent this? - Prevent the attacker from recognizing collisions by masking each auth stored in memory
Performance is relatively reasonable seeing a worst case performance penalty of ~15% |
2021-07-07 |
John |
|
What are Reed Solomon Codes? Error Correction codes (ECC): Error correcting codes are code that have redundant information Information can be used to reconstruct in the presence of noise Reed Solomon ECC - Used in a lot of places (CDs, DVDs, QR codes)
- Operate on blocks of data which are treated as finite field elements
|
2021-07-14 |
Asokan |
|
Intel Software Guard Extensions (SGX) Allows for isolating environments (Enclaves) Implemented in hardware Enclave code/data encrypted by hardware Supports: - Sealing: Hardware support key to turn an unsecured storage into secured storage
- Attestation
Local Attestation: One enclave verifies another on the same device Remote Attestation Enclave Identity (Sealing Identity) - Sealing authority (SA) signs enclaves prior to distribution
Several break in the past - GSM A5/1 algorithm
- Relay attacks against NFC payments
Is SGX useless? - Who is the adversary?
- Where is your vulnerable program running?
- There is a deployment push
- TrustZone even more widely deployed/used
|
2021-07-14 |
Hossam |
|
Rowhammer So Far What is Rowhammer? DRAM is a matrix of cell that each hold 1 or 0 - Each cell has a capacitor
Cells accessed at row-granularity, and gets moved to row buffer Cells are packed tightly to provide more storage �Hammering� of one row flips bits in adjacent rows (Memory integrity violation) Hammering a row is accessing the row repeatedly Rowhammer-based Exploits Breaking out of NaCl sandbox - NaCl assumes validated code does not change
- induce bit flips in indirect jump instructions to jump to hidden shell code
Kernel privilege escalation - Induce bit flips page tables gives process access to its own page table
Rowhammer.js - No longer need cache-flush instructions
Remote browser-based attack using javascript Throwhammer Only need to send packets to victim Relies on Remote DMA feature of NICs No longer need to lure victim to malicious webpage RAMBleed Breaks confidentiality Main enabler: likelihood of bit flips depends on bit pattern Carefully arrange aggressor and victim pages in memory Software Defences: - Increasing memory refresh rate which negatively affects performance
- Reactive refresh but is hard to identify which rows are physically adjacent because of proprietary mapping
- Physical isolation using guard rows, but wastes memory
Hardware Defences: - ECC, but has limited error-correction
- Target Row Refresh, but can be circumvented using many sided RowHammer
- Proactive trolling, prevent row activation rate from reaching level needed for bit flips
Good news, although still a problem today, but no Rowhammer attacks found in the wild yet |
2021-07-21 |
Akshaya |
|
Privacy-preserving Web Analytics Most websites collect user data for analytics Threat Model: - Only aggregate or encrypted user data is logged
- Reported results are differentially private
- Website operators are trusted not to log user data in clear
- Resistant to compulsion and single snapshot attacks
Uses �PS signatures, which are Randomizable� Frequency of Client Visits - Clients store encrypted connection counts locally
- Server stores a differentially private histogram of frequency of client visits
- Server learns client connection count when a client connects to it
Client-side persistent storage - Server inits a histogram with independent Laplacian noise added to every bin and generates a PS signature key pair
- Client connects for the first time or reset its counter
- Server issues a PS signature
- Server updates its histogram
- Client chooses randomness and randomizes the PS signature
- Client encrypts 1 and the randomized signature using some public key encryption to obtain a token
- Client stores token in its local storage
|
2021-07-21 |
Andre |
|
Connecting the NOPS: A Quantitative security metric for CFI mechanisms CFI mechanisms restrict control flow to limit jumping to unintended locations in code How do we compare mechanisms? Currently there aren�t accurate metrics. Proposal: �NOPS Finder� Identifies path between gadgets Take chi into considering while finding paths between gadgets Derive metric from ability to find such path |
2021-09-21 |
Miti |
|
CS458 Spring 2021 A teaching summary & lessons learned - Module design, lectures and videos
- Main changes to the content
- Updated slides from modules 4, 5.2 and 6 as well as created new videos for those topics
- Updated the quizzes related to the modified modules
- Improved Blogtask (added fine-grain rubric) and interactive sessions
- Tools used for teaching were Learn, Teams, Crowdmark and CSsharepoint
- Module Design: Trinity of elements to convey concepts
- a) Intended learning outcomes
- b) Teaching & Learning Strategies
- Interactive activities
- Lectures
- c) Formative/Summative Assessments
- The goal was to create an alignment among these elements
- Example: Concepts discussed and learned in interactive sessions (b) were later re-introduced in assignments (c)
- Content Sources:
- Skipping it here for fairness reasons
- Summary of content changes:
- Module 4
- Organized "Threats" subsection based on interception, interruption & modification threats.
- Included IP spoofing, ingress/egress filtering details, smurfing attacks etc.
- Firewalls: related content to previous sections of the module (vO).
- Explored different firewall architectures to motivate DMZes (F&F). Proxying and MiTM, Tor SOCKS proxy.
- Examples of NIDS/HIDSes & their architecture.
- Module 5
- Basic Internet S&P (module 5): Extending content layer-wise.
- Link layer - WEP.
- Network layer - VPN types, IPSec modes.
- Transport layer - TLS: ClientHello, ServerHello, ClientFinished.
- Application layer - SSH, SMTP & secure email.
- Privacy: Tor message format in terms of encapsulation. Added PIR (as a data minimization PET).
- Modules 6 & 7
- Access control: SQL commands for granting/revoking privileges - DAC, RBAC. MLS databases - polyinstantiation etc. (E&N).
- Updated confidentiality, authentication, availability, auditability sections (IBM, Oracle Docs).
- Clarified what does and does not work for the data inference problem.
- Used existing (not-private) data release examples to motivate k-anonymity & ell-diversity.
- Included examples for k-anonymity & ell-diversity.
- DP: motivated the definition, described sensitivity via examples, introduced the Laplace function, claimed Laplace function provided DP (proof in interactive sessions).
- Module 7: Included Bailey's slides on ethics; focused on cost analysis.
- Videos
- Limited the length to ~20min
- A lot of the work was done on editing, captioning and providing great availability (Microsoft Streams & Learn)
- Quizzes and Blogpost
- Quizzes
- Skipping some parts for fairness reasons
- Started the quiz at the start of the module, rather than 1 week into it.
- Changed to multiple select from multiple choice
- Modules split in 2 parts with 1 quiz per part
- Infinite retries without showing actual answers
- Future terms could use the Retake incorrect questions only option
- Blogpost: Represented 5% of the final grade
- Students had to
- Make a blog
- Comment on other's blogs
- Reply to comments on their blogs
- Fine-grained rubric was added: assimilated criteria in bullet-point from the webpage
- Future terms: consider adopting a coarse-grained cubic, address blogpost/comment deadline issue
- Final Assigment
- Skipping it here for fairness reasons
- Interactive Sessions
- 5 Interactive Sessions on the following topic:
- Wireshar
- SAD DNS Cache poisoning attack
- ADBc, ZKPs
- Data Inference Problem
- Differential Privacy
- (Programming) TA's experience:
- Improve "submit" command
- A1: all-or-nothing scheme: hard to give partial marks
- A2: Firewall IDS: Lots of time & effort spent by programming and sysadmin TAs.
- Looking Ahead
- Active Learning
- Keep Q&A
- Add/intersperse individual or peer activities
- Teaching pedagogies
- Flipped classrooms
- Pre-class content
- In-class activities
- Post-class
- For people looking to teach in the future the following certificates were recommended
- FUT first, can be done as a Master's student
- CUT afterwards
|
2021-09-28 |
Youcef |
|
Toward Post-Quantum Key-Updatable Public-Key Encryption via Supersingular Isogeny - One of the applications is for long term conversation
- Protect the content using UPKE
- Relies on logarithm, not quantum secure
- Studied viability of UPKE post-quantum
- UPKE is composed of 6 algorithms (KeyGen, Encrypt, Decrypt, GenUpdate, UpdatePrivate, UpdatePublic)
- There are 5 properties that are desired in such an encryption scheme:
- (1) Correctness
- (2) Forward Secrecy
- (3) Post-Compromise Security
- (4) Asynchronicity
- (5) Key Indistinguishability (not always required)
- We have both Symmetric and Asymmetric UPKE, however Asymmetric UPKE needs a homomorphic structure
- A wanted characteristic is IND-CPA ( Indistinguishability Under Chose Plaintext Attack)
- We can add Updatability (adversary can update or choose a key to corrupt)
- SIDH-UPKE
- KLPT* can help with forward security
- Issues:
- No Asynchronicity -> no known solution
- The degree of is too big
- Solution: Separate into multiple isogenies
- SIDH-UPKE respects only (1). Once KLPT* is introduced, it also respects (2) and (3)
- CSIDH-UPKE
- Issue: no forward-secrecy
- Updated key centred around original secret key
- Key too large after multiple updates
- Solution: Use known cyclic group of order N. Downside is, the group structure must be known
- CSIDH-UPKE is quantum secure
- (5) can be obtained if the key size is limited.
|
2021-09-28 |
Lindsey |
|
Censorship Circumvention Lox: Protecting the Social Graph in Bridge Distribution - Censorship Circumvention Research:
- Investigating censor behaviour, tools and techniques
- Monitoring censorship
- Designing better circumvention tools
- Finding flaws in circumvention tools
- Notable usefulness example: Dramatic increase in Relay and Bridge connections during the Burma coup when the Internet was shutdown.
- Bridge: server that exists outside the censored region
- Techniques used by the censor to find users and terminate bridges
- Enumerating bridges
- Blocking bridges
- Posing as users
- Goal: Protect users, limit censor capacity to block users and bridges
- Prior Work:
- Tor bridge distribution
- Anonymous Distribution by Social Graph
- Leveraging Trust Levels
- A bridge is needed when access is blocked to the Tor Network
- Setup bridge via email with bridge authority
- Use Tor browser for example to connect to the bridge
- This limits censors from enumerating bridges (not 100% effective)
- Average of 12 users per new bridge
- Core concepts:
- Bridge distribution using anonymous credentials to protect users and Social Graph
- Invitations: honest users have friends
- Bridge Authority is issuer and verifier
- Leveraging Trust Levels:
- Accumulate trust with time
- Trusted users can recommend friends
- More trust -> can go to more exclusive and trusted bridges
- Lox:
- Limits enumeration
- Prioritizes protecting user’s Social Graph
- Uses Trust Levels
- Higher Trust Levels allows for higher number of bridges and protocols
- Lvl 0: Untrusted, 1 bridge bucket
- Lvl 1: Trusted, 3 bridge bucket
- Lvl 2: Trusted, Invitations
- Lvl 3: Trusted, Migration eligible
- Lvl 4: Trusted, Migration + Invitation eligible
- Computes MAC for issuing and verification
|
2021-10-05 |
Sajin |
|
Express: Lowering the cost of metadata-hiding communication with Cryptographic Privacy - Metadata-resistant communications
- Why? To protect whistleblowers
- Why are existing mechanisms not sufficient
- Who, when and how much is not protected
- Only content is protected
- Private Information Retrieval (PIR)
- Retrieve document from a pool from remote servers without leaking which document was retrieved
- Express: Uses PIR in reverse to improve metadata hiding communication
- Privacy guarantee for writes but not for reads
- Threat model:
- 2 non-colluding servers A, B that collectively hold the contents of a set mailboxes
- Clients are potentially malicious
- Private writing via PIR:
- N mailboxes
- Content of a mailbox is a vector D in F^n
- Each server holds an additive secret share for this vector for each mailbox
- To write a message:
- Client prepares m, e_i in F^n
- Client splits m into w_a and w_b
- Sends w_a to A and w_b to B
- Servers each add their w to the respective secret share of the mailbox the message is sent to
- Issue: Scales poorly
- Improve efficiency with Distributed Point Functions (DPFs)
- Leverage PRFs under the hood
- Scales logarithmically with N instead of linearly
- Messages can only be read by the mailbox owner to whom it is sent
- Enabled via simple symmetric encryption
- Client requests both servers for the contents of the mailbox to read the messages
- Dependent on honest clients
- Malicious clients can submit malicious writes
- Solution: auditing protocol:
- SNIP: client proves to the server that the message is well-formed
- Prevent targeted disruption:
- Switch mailbox addresses from physical to virtual address (128-bit)
- Client needs to know both physical and virtual address to send a message
- Performance:
- For 1M mailboxes. 101x less communication cost than Riposte on the client side
- Savings come from:
- Better auditing
- Faster DPFs
- Client cost constant irrespective of the number of mailboxes. Riposte scales linearly with N.
- Cover traffic and plausible deniability
- Send dummy messages to cover traffic using low client-side costs
- Conclusion:
- Improves state-of-the-art both asymptotically and in concrete performance numbers
|
2021-10-05 |
Simon |
|
Mixes - There are 2 types of anonymous communication systems:
- In the presence of a Global Passive Adversary (GPA), TOR cannot protect its users
- Solution: Mixes
- Adding delays, cover traffic and change appearance of messages
- Delays prevent arrival/departure timing
- Cover traffic is there to fool observer as to limit guessing of sender/receiver of a message
- Types of Mixes
- Mixes operating in rounds:
- Threshold Mix: batched with a threshold trigger
- Timed Mix: batched with a time trigger
- Binomial Pool Mix: (can be timed or threshold triggered) probabilistic (alpha chance message is sent, 1-alpha chance it is kept in the pool).
- Others
- Stop-and-Go (SG)
- Continuous Mixes
- Mixnet Topology:
- Cascade
- Free route
- Stratified
- Cover Traffic Types
- End-to-end
- Mix-to-Mix
- Sender/Receiver based
- Loopix
- Stratified mixnet composed of:
- Service providers
- Continuous mixes (Poisson distribution on delays)
- UDP, no circuits
- Real messages
- Drop dummies
- Loop dummies: loops back to service provider
- Mix loop dummies: helps against active attacks
- Low latency (seconds)
- Nym
- Composed of:
- Users and Provider
- Mixnet (like Loopix) and Gateways
- Validators
- Nym blockchain
- It is for profit: Users pay validators
- Users pay validators for a bandwidth certificate
- User sends the certificate to the gateway (which talks to the provider) with the message to prove he paid
- Only gateway and validators know user IP
- Money limits corruption/civil attacks
- GPA/corrupt with budget
- Depends on honest users total budget vs GPA budget ratio
- Communication epochs: restructure nodes by shuffling them periodically
- Details
- Testnet has around 9k nodes
- Launch late 2021/early 2022
- Chat app and Crypto wallet included
|
2021-10-12 |
Vasisht |
|
Fault Tolerance of Neural Network in Benign and Adversarial Settings - Reliable Deep Learning
- Deployed for critical applications for long periods of time
- Fault Tolerance
- Goal: NN performs well in presence of node/synapse fault + graceful degradation
- Def: Epsilon-Tolerance
- Epsilon=0: complete tolerance (ideal)
- Epsilon>0: partial fault tolerance (realistic)
- Stuck @0 Fault
- Parameter values/intermediate computation
- Can be caused for example by a fault in MAC unit
- Generalization Error: Metric for fault tolerance
- Low generalization error means higher fault tolerance
- High generalization error means low fault tolerance
- Enhancing Fault Tolerance: 3 approaches
- Traditional Reliability Engineering
- Adding redundancy (bad for resource constrained systems)
- Augmenting synapses and nodes (bad for overparametrized NNs)
- Fault Tolerance as a constraint
- Computationally expensive and not scalable to large NNs.
- Regularization
- Restricts overfitting
- Trade-off between test accuracy and generalization error
- Approach Used:
- 2 models
- Feature extractor (FE)
- Full connected network (FC)
- 2 phase process
- Phase 1: Train feature extractor
- Game 1: Identify and extract robust features
- Game 2: Distributional smoothering of extracted features
- Phase 2: Supervised fine tuning
- Phase 1:
- Game 1:
- FE: identify and extract features
- Generator: Reconstruct input image from FE’s extracted features
- Game 2: Minimax optimization between FE and a discriminator
- Discriminator distinguishes FE’s output from Gaussian distribution
- Evaluation
- Better trade-off between accuracy and generalization error. Better than L1 and L2 regularization.
- Adding noise during training
- Improve fault tolerance
- Can be added with objective to penalize large parameters (adding noise to the gradient)/
- Impact of adversarial noise on fault tolerance
- Model overfits
- Higher generalization error compared to SGD/regularization
- Increasing noise results in higher generalization error and lower fault tolerance
- Impact of gradient noise on FT
- DPSGD enhances fault tolerance (+privacy) with a bound on generalization error.
|
2021-10-12 |
Jason |
|
The problem with DNS - Designed a long time ago with a design philosophy that does not align with today’s world anymore
- Can be abused by
- Fragmentation
- Most DNS messages are sent using UDP
- Attacker registers long domain name
- Requests a DNS resolver
- Response exceeds what UDP packet can fit -> 2 fragments are sent back
- Malicious middle box tries to send fake 2nd fragment to DNS resolver
- Caching means all clients are affected
- Solution: DNSSEC
- Use digital signatures to sign sets of resource records
- 2 keys
- Key signing Key
- Zone signing Key
- How to maintain key integrity?
- Construct a chain of trust
- Root key on host machine
- When zone signing key is queried, use trust anchor to verify the signature
- Return of fragmentation for DNSKEY and RRSIG (large records)
- Increases likelihood of fragmentation by a lot
- Firewall can also block fragments -> denial of service
- Fix: Use truncation flag and follow up with a TCP connection
- Current approach of DNS community, keep keys and signature sizes relatively small
- DNS protocol will need to be modified in some way
- Early idea that is being run:
- Move fragmentation to the application layer -> help with the firewall problem
- Use “pseudo resource record” called RRFRAG when response too big for a UDP packet
- Still keep truncated flag + TCP for backwards compatibility
- DNS resolver decides whether to keep using UDP or switch to TCP when receiving truncated flag.
|
2021-10-19 |
Setareh |
|
The Last Mile: High-Assurance and High-Speed Cryptographic Implementations - Goal: Framework for efficient and trustworthy crypto libraries
- Correct
- Protected against side-channels attacks
- Formal security proofs
- Jasmin: Language for speed and compiler formally verified
- Allows for “assembly in the head” programming
- EasyCrypt: Proof assistant for provable security
- Game-based cryptographic proofs
- Enhanced Jasmin
- What was added/improved:
- Added SIMD instructions
- Support for different value sizes
- Instruction distributors
- Embedding of Jasmin in EasyCrypt
- Enhanced Jasmin compiler creates EasyCrypt models for
- Functional correctness
- Security proofs
- Constant time
- Added safety check to Jasmin Compiler
- Safety analysis: By Jasmin Compiler
- No division by 0
- Out-of-bond array accesses
- Variable initialization
- All loads and stores take place in allocated chunk of memory
|
2021-10-19 |
Emily |
|
Similarity Detection in a sparse dataset setting - Problem: Privately detecting similarities in a sparse data setting
- Ensure differential privacy
- Idea:
- Client encrypts their data and sends it to the server
- Server does nearest neighbour using several K-D trees
- Use interactive DP protocols
- Client receives results
- K-D trees:
- Find nearest points near given data point
- Binary tree where each node has K values (dimensions)
- Alternate each dimension at each depth when traversing down or up the tree
- Element of randomness in tree, 2 trees can look different even with the same starting dataset
- Indices “split” the plane in regions
- Need to backup through the height of the tree at most twice when searching -> O(logn) runtime
- K-D trees useful as long as there are more items than dimensions
- Overall protocol
- Client encrypts data and sends it to the server
- Server constructs servers K-D trees of the dataset
- Server finds nearest neighbour in each K-D tree
- Client keeps record of each answer
- Summary
- Solving the problem of privately detecting similarities in a sparse data setting
- Using K-D trees for efficient search
- Interactive protocol with the client
- Assuming a semi-honest server
|
2021-10-26 |
Zeinab |
|
Dissecting Residual APIs in Custom Android ROMs - Original Equipment Manufacturer (OEM) Private APIs
- APIs are added, modified and removed frequently
- OEM private APIs are added and removed erratically
- Consequences of Residual APIs
- Increase in code complexity
- Compatibility issues
- Broaden attack surface
- Historical Residual APIs
- Previously used API
- Defined but not used in at least one image
- Lifespan of an API
- Active Lifespan:
- Residual Lifespan:
- Model (device) Residual APIs
- Originally for use in specific models (example: phone model)
- Residual Safeguards
- Configuration checks
- Access Control Checks
- Residual APIs may be overlooked during updates, leads to:
- Unsound Security Features
- Can allow attacks like keylogger attacks
- Obsolete Access Control Enforcement
- Failure to keep with improving standards
- Large Scale Study performed:
- 628 custom ROMS
- 26,833 distinct OEM APIs
- REM (REMNANT) structure:
- ROM analyzer
- Usage-pattern analyzer
- Risk Identifier
- Generates Residual’s ICFG
- Collect defined security features and see if they are used
- Performs a differential analysis
- Intra-ROM convergence analysis
- Cross-ROM Inconsistency analysis
- LG and Huawei have the highest number of Residual APIs.
|
2021-11-02 |
Florian |
|
Is Differential Privacy what you want to protect privacy in ML? - Need to protect sensitivity of training information for ML models.
- Differential privacy:
- Given random mechanism M, two adjacent inputs A, B: Pr[M(A) in S]/Pr[M(B) in S] ≤ e^ε
- It is a provable guarantee
- Maintained in post-processing
- Differential private mechanisms are composable
- Arbitrary background knowledge/executions/number of changes in the data set.
- Differential private SGD (DP-SGD)
- For each iteration, perform gradient clipping and perturbation
- Clipping limits the impact of any individual input
- DP-SGD assumes the worst case: the one input that causes the reverse direction in each iteration
- The definition of differential privacy assumes that the adversary knows the entire data set except the one input in question
- Empirical Privacy:
- Bound on membership inference
- Membership is a property of the entire data set, not only the single member
- There exists a trade-off between privacy and accuracy
- How do you explain differential privacy?
- Differential identifiability: was a sample part of the training set?
- Use Bayes’ Theorem
- Theoretical privacy: can provide a bound of posterior belief
- Observation
- The DI attack is tight on the bound when using data set sensitivity
- There is a gap when using global sensitivity
- Summary:
- Protection against membership inference means very strong privacy parameters in DP-SGD necessary (might render the training useless)
- To protect against membership inference, ML best practices are better suited than DP-SGD
- Open Questions: What besides DP can we use to measure privacy in ML?
- Explainability is an important aspect
|
2021-11-02 |
Xinda |
|
Model Poisoning Attacks and Defences on Federated Learning - Federated Learning
- Multiple owners, single model
- Each client trains locally
- Local models are sent to the server who then aggregates them to form a global model that is the redistributed.
- Robustness Issues:
- Clients are not trusted
- Attackers can backdoor/lower the accuracy of the global model
- Even a single attacker can cause a lot of damage.
- Existing Defenses:
- Detect and remove outliers from model updates before aggregating
- Poisoning attacks
- Data poisoning attack
- Model poisoning attack
- General Attack Objective: Find malicious update within the space of the benign updates which maximize the distance between the benign and malicious aggregates
- Tailored Attack: Find perturbation vector and a scalar that maximizes the distance between benign and malicious aggregates
- AGR-agnostic attack: keep malicious updates with the scope of the benign updates.
- Experiment:
- MNIST non-iid setting
- Non-iid common in real world
- 100 clients in total, 30 participants
- 20% of clients are malicious
- Attacks have a very significant impact
- Their own defence: Clustering-based Defence
- Clustering limits the impact of the attacker
- Evaluated using the same setting
- Very little accuracy is lost relatively speaking
|
2021-11-09 |
Faezeh |
|
What is Machine Unlearning - Why would we need machines to unlearn?
- Privacy
- Security
- Usability
- Right to be forgotten: mandates that companies take reasonable steps to achieve erasure of a person’s data
- Easy to remove data point from dataset but hard to unlearn what was learned from it
- stochasticity in training
- Sequential training process
- Unlearning: if you learn a data point then unlearning it should bring you back to where you started before you learned said data point
- Solutions?
- Differential privacy: Bound on contribution from each data point
- Re-training? Naive, has a large computational cost
- Statistical query learning
- Does not easily scale to more complex models
- Sharded, Isolated, Sliced and Aggregated training (SISA)
- Fast unlearning
- Weak-learner
- Requires additional storage
- Models are shared where each shard is trained on different data points: can just retrain the specific shard
- Can also slice the shards
- Verified Unlearning: User does not trust the service provider. How can the user verify that his data has been unlearned?
- Probabilistic Verification of Machine Unlearning:
- Membership inference attacks
- Backdoor attacks
|
2021-11-09 |
Shelly |
|
Adversarial Policies: Attacking Deep Reinforcement Learning - Reinforcement Learning (RL):
- Agent interacts with the environment
- Sequential decision-making
- Finding optimal policy to maximize reward
- Deep RL:
- Combining RL and Deep Neural Networks (DNN)
- Good with high-dimensionality
- Adversarial Examples:
- Perturbations added to clean input
- In RL, add noise to input state to make the model make a bad decision
- Multiple Agent Environment
- DRL agents playing against each other
- Adversarial Policy:
- Train adversarial agent to exploit weaknesses in the victim agent
- Adversarial Model
- Black-box
- Unlimited queries
- Victim follows a fixed stochastic policy
- Can be modelled as a two player Markov game
- Since victim is fixed, the problem for the adversary can be reduced into a normal Markov Decision Process (MDP)
- 4 games:
- Kick and Defend
- You shall not pass
- Sumo humans
- Sumo ants
- Adversary policy reliably wins against most victim policies and outperforms ZOO baseline in most environments and victims.
- Qualitative Analysis:
- The adversary agent did not learn a more “advanced” playing strategy
- Masked Policies:
- The victim’s observations of the adversary is static
- Improves victim’s win-rate
- For sumo ants, adversarial policies don’t work as well
- Possibly due to low dimensionality
- Defense: Fine-tuning. Two configurations
- Single: only tuned against adversarial policies
- Dual: tuned against adversarial policies and other ZOO policies
- Summary:
- New Threat Model
- Verify that these adversarial policies exist for some environments
- Verify that these adversarial policies win by confusing victims with weird actions instead of stronger policies
|
2021-11-16 |
Peyman |
|
Preventing Frontrunning attacks using Identity-based encryption - Maximal extractable value (MEV): revenue you can get by reordering, censoring or adding transactions in blocks.
- For Etherium: currently lower bound is 740M USD
- High Gas fees
- Occupation of Blocks
- Endanger the consensus
- Frontrunning: attack to get MEV
- A sends her transaction (tx) to the block proposer
- The proposer adds it to their mempools
- B send his tx to the proposer
- Both A and B are in the mempool
- The proposer can add their own txs before A’s txs even though A’s txs arrived before
- Most of the profit goes to the frontrunners, not the miners
- Example: Crypto-Coin NFT
- Bought NFT from typo using front-running to buy before the order gets cancelled
- Some of the solutions that exist: with their problems:
- Slippage in DEXes leads to Sandwich attacks
- Flashbots: Good but not really a solution
- Trusted Execution Environment (TEE): Significant latency, storage limit, security issues
- Submarine sends: not cryptographically secure, just for ICOs
- Commit-Reveal: Deciding not to reveal, connectivity and availability
- Treshold Decryption: High bandwidth overhead
- RQ: Can we have front-running protection that works with less bandwidth overhead?
- Identity-based-encryption (IBE)
- Setup:
- A PKG with msk and system-wide public key (pk)
- ID such as email address or IP can be used to generate pk
- Encrypt: encrypt m using pk, c= E_pk, ID(m)
- Extract: S_ID= Extract(ID, msk)
- Decrypt: m = Dec_SID(c)
- Setup of the structure:
- Players: Clients, Executors, Decryptors
- Communication: smart contracts (Participate, DXG, IBE, Process, Target)
- Adversary: n-t Executors, also not all Decryptors
- 6 Phases:
- 0: Participation
- Evaluators and Decryptors call Participate + deposit
- Client deposits in Participate contract
- 1: Distributed Key Generation
- 2: Encryption and Commitment
- Client creates a commit contract to encrypt their transaction
- 3: Broadcasting Block Key shares
- 4: Decryption
- 5: Execution
- Processes the contract:
- Validates received transaction(s)
- Validates the commitments in the commit contract
- Send incentivization feedback to participate contract
- Sends amount of transaction to the target contract
|
2021-11-16 |
Justin |
|
An Introduction to Software Reverse Engineering - Recap on Compiling
- Get machine code, feed it in the disassembler to get disassembly code (close to assembly code).
- Use decompiler to get back the original code
- Decompiler/disassembler can be lossy
- Once you have decompiled C, try reversing to get an approximation of the source code
- IDA Pro most commonly used tool:
- Not open source, expensive
- Ghidra:
- Open source
- Made by the NSA
- Walkthrough of Ghidra
- Difficulty is different from coding: have to find how the data is arranged in the program rather than looking at the code itself
- Reconstruct the function from the decompiled C
- Focus on decoding data, not code itself
- Closer to a gigantic logic puzzle
|
2021-11-23 |
Marian |
|
Multi-Party Minimum Spanning Foress - Definition
- Given weighted, undirected graph, find a set of edges with minimum weight
- Vertices have to still be reachable if they were before
- Two-Party Computation
- 1 owns x, 2 owns y
- Jointly compute f(x,y)
- Can’t learn more than what the function itself reveals
- MPC Minimum Spanning Forests (MPC-MSFs)
- Vertices public
- Edge set split between two parties
- General Techniques
- Take any MSF algorithm and turn it into a circuit
- Problem:
- Need random memory access
- Oblivious RAM (ORAM)
- Solutions tailored to the MSF problem will be much faster
- Semi-honest O(n)-rounds protocol
- Simulate Kruskal’s algorithm and reveal each badge immediately
- Each party finds its smallest edge, secret shares it and compares it to find the smallest one
- Secure because the only thing a party learns are edges that belong to the MSF
- MSFs are not necessarily unique
- If no weight occurs more than once, then a graph’s MSF is unique
- Other direction does not hold
- Use canonical order of edges to break ties
- More MSF properties:
- Assume each weight is unique:
- Then for each vertex, the smallest incident edge is always part of the MSF
- Previous property yield Boruvka’s Algorithm:
- For each vertex, find the smallest incident edge
- Merge vertices for which an edge was selected
- Repeat
- Runtime is O(log(n)) rounds, O(n) minimum computed (n total)
- Performs much better in high latency networks
- O(log(n))-rounds semi-honest protocol
- Same as O(n)-rounds version but this time with Boruvka’s algorithm
- Random MSFs
- All edges have the same weight
- MSF=SF
- Canonical ordering has high impact on the resulting MSF
- Random MSF: choose random edge ordering
- Alternatively: Choose random secondary weights
- Simple solution
- Jointly generate random secondary weights and reveal them
- Now can execute any MSF protocol
- Not secure, reveals too much information
- Even when only the ordering of the own revealed edges is known
- O(log^2(n))-rounds protocol
- Saves time by reading/writing multiple entries in memory simultaneously
- Use another MSF algorithm with O(log(n))-rounds
- Doesn’t reveal values in the middle of the protocol
- Mixed Protocol
- Useful if the number of duplicate weights is small
- Tries to reveal selected edges as early as possible
- Switches to non-revealing protocol only when necessary
- Summary
- Unique edge weights -> O(log(n))-rounds
- Few repeated weights -> Mixed protocol
- Many repeated weights -> Protocols without intermediate data
|
2021-11-30 |
Navid |
|
Some new results on All-or-nothing Transforms and a request for help - t-All-or-nothing Transforms (AONT)
- If you have less than t blocks of information, you can’t get any information/compute anything about the input blocks.
- Entropy doesn’t change before or after you acquire the t blocks
- Moving forward: X are the input blocks, Y the output blocks
- An (N, k, v)-array is an N by k array, say A, whose entries are elements chosen from an alphabet T of order v.
- We say A is unbiased with respect to a subset D if all elements in the alphabet that are in the subset D are as likely to appear.
- A (t, s, v)-AONT is equivalent to a (v^s, 2s, v)-array that is unbiased with respect to a specific subset of columns.
- Asymmetric, rectangular, range and strong, restricted are different types of AONTs.
- If an AONT satisfies the entropy definition, then it has perfect security.
- AONT: Weak Security:
- We don’t need the entropy to remain the same, we’re happy if all possible t inputs have a non-zero probability.
- If all inputs are equiprobable, then an AONT has perfect security.
- Cases in between:
- Combinatorial (t, s, v)-AONT
- H(X knowing Y) = H(X’ knowing Y)
- If inputs are mutually independent, if attacker only has s-t output blocks, entropy of t input blocks is the same.
- What’s next?
- Gathering information regarding usage cases
- Currently thinking of a survey
|
2021-12-07 |
Nils |
|
Fooling Adversarial Watermark Transformer for Text Provenance Verification with Adversarial Examples - OpenAI GPT-3 Language model
- Scaled parameters significantly -> improved accuracy greatly
- Human baseline: 95% on LM dataset, GPT-3 getting close
- Using model to generate code from sentences describing it
- Scared of potential misuses of GPT-3
- Large-scale online radicalization and recruitment
- Model can be used as an API (too big to use locally)
- Existence of content warning that can detect potentially dangerous topics
- Text Provenance Verification
- ~ Watermarking Text
- Using synonyms
- Hiding & revealing network
- Rule-based methods or AWT (Adversarial Watermark Transformer)
- Threat Model
- Requirements
- Efficacy
- Secrecy: Indistinguishability
- Robustness
- Attacker Capabilities
- Black-box API
- No access to non-watermarked text
- Attacker’s Goal
- Preserve the service’s output utility, but remove the watermark
- Evaluated Attacks
- Random noise
- Re-watermarking
- Discriminator Training
- AWT:
- Language model loss
- Semantic loss
- Hiding & Revealing Network and a Discriminator
- Discriminator used to discriminate watermarked from non-watermarked text
- Can encode 4 bits per 80 words
- There are some grammatical failure modes
- Model is pretty diverse in how it replaces words
- Attacks
- Show attacks don’t remove the watermark
- Only makes the sentence worse
- Weakness in AWT
- No inherent source of randomness during training
- Pre-trained encoder
- Breaking AWT’s robustness
- Train a surrogate network, generate an adversarial attack. Use transferability. Not sure if it works yet.
- Breaking Secrecy
- Authors assume we cannot get watermarked, non-watermarked pairs
- However, could sample repeatedly semantically similar passages of text and contrast differences between sentences to reverse-engineer the encoding
- Breaking Efficacy
- Generate input sentence blocks that appear benign, but are modified to a malignant sentence by AWT.
|
2021-12-07 |
Stan |
|
On Variety - Lots of bots and trolls on the internet
- Fake human profile pictures
- How to handle them?
- Human token
- Using government id
- Paying some money?
- Captchas
- NFTs
- Receipt of ownership on a public ledger
- Private Reputation Supporting Ongoing Network Avatars
- Fun
- Related to novelty
- Animals also seek fun
- Covid-19 and boredom (in humans)
- Difficult to study in a laboratory setting
- Humans don’t get bored enough in natural setting to study either
- Boredom researchers have been using the pandemic as an opportunity to gather data on what people actually do when confronted with boredom
- Boredom and variety
- Inconclusive connection between boredom and creativity
- Boredom is a response to a context
- There is value in spending time on variety and novelty in work/research
|