layout: true name: base layout: true class: center .header[.floatleft[Privacy Preserving IoT].floatright[.white[Christopher Biggs — @unixbigot]]] .footer[.floatleft[linux.conf.au .hashtag[lca2020] Gold Coast, Jan 2020]] --- layout: true name: inverse class: inverse, middle template: base --- layout: true name: callout class: middle, italic, bulletul template: base --- layout: true name: pink class: middle, pink template: base --- layout: true name: toply class: italic, bulletul template: base --- layout: true name: toplypink class: italic, bulletul, pink template: base --- layout: true template: callout .crumb[ # Welcome ] --- class: center, middle template: inverse # Privacy Preserving IoT ## .orange[Escaping the clutches of Big Data] .bottom.right[ Christopher Biggs, .logo[Accelerando Consulting]
@unixbigot .logo[@accelerando_au] ] ??? TEST SLIDE - The next slide is identical TURN ON YOUR MIC START YOUR TIMER Abstract: Right now, the state of privacy on the Internet is "we collect every bit of data about you, crosslink everything and use it to manipulate your attention". The internet of Things brings the promise (threat?) that "every bit" comes to mean not just everything you did online, but also everything you did in your home, workplace, car and bedroom. The future is shaped by those who have the strongest vision of what it should be. Right now that's Big Data, which culturally rhymes with "Big Oil", "Big Tobacco" and "Big Pharma". If we don't want the grim meathook future they are cooking up for us, we need to visualise what we DO want and fight harder to make it happen. So what does a privacy-perserving future look like? How can we construct an internet where the value of information accrues to individuals, not to billionaires? Many of the pieces are already in place. * Emerging data processing algorithms such as Private Set Intersection and Homomorphic Encryption * Personal data enclaves such as the Hub of All Things (HAT) (hubofallthings.com) * Data exchanges like the Sam project (samnow.com) * Privacy-first IoT data networks like LoRaWAN and Amazon IoT Join us as we fit these pieces together and imagine what Internet life (aka "life") might look like when we wrest power back from Big Data. Private Abstract This presentation is the result of research and proofs-of-concept that my organisation has done in the course of helping a startup to build a privacy-focused information exchange, an anti-facebook, if you will. --- class: center, middle template: inverse # Privacy Preserving IoT ## .green[Escaping the clutches of Big Data] .bottom.right[ Christopher Biggs, .logo[Accelerando Consulting]
@unixbigot .logo[@accelerando_au] ] ??? Hi everyone. And when I say everyone I mean the [COUNT] mobile devices that I have enumerated and logged in this room. Your attendance will go down on your permanent record. And if you're ever involved in a privacy breach your insurer will refuse your claim because they know you were here and should have known better, so pay attention. I come from Brisbane which is a little village a just up the road from here. In that village, I am the descendent of the village blacksmith. My ancestors took their plough to a blacksmith to be mended. Your parents took their car to my father to fix the brakes. Today people bring their robot vacuum cleaner to me to find out if it's secretly recording upskirt videos. At accelerando we help people enlist computers and robots to make their lives easier, without that inconvenient part where they rise up and exterminate us. Or at least exterminate our privacy. --- # Shameless Self-Promotion ## Brisbane Internet of Things Meetup .tight[ * Evening Presentation and Networking, last Monday of each month * Afternoon Workshops, 2nd Sundays * Visitors and speakers welcome! * Find us on [meetup.com](https://www.meetup.com/Brisbane-Internet-of-Things-IOT-Meetup/) ] ??? But before I go on, some more shameless self promotion. Another of my hats is host of the brisbane internet of things interest group. This group offers a learning enivironment to help you, to help all of us, steer toward a livable future, so if this presentation sparks your interest, you are encouraged to join us. We also do practical workshops, where you can get hands on with all manner of technologies. You can find more information on meetup.com --- class: center, middle template: inverse # PART I: The Dream ??? I mentioned creating a livable future. And one of the things I want to do this afternoon is to give you some perspective on why I think the internet of things can be a positive force multiplier. On monday, I said the internet of things was a silly name, because it faces the wrong way. Sure, devices talk to the internet. But so what, everything does. What's important is the they do for humans. They are human centered technology. I'll keep using the term internet of things for today, but I want you to remember that they work for us, we are their purpose and evil is what happens when you start to treat people as things. --- # Devices, Communications, Data ## The pillars of the connected future ??? Whatever you want to call it, I have my own personal definition of the concept we call IoT, and it is this: The internet of things is the fusion of inexpensive embedded devices, pervasive wireless communications, and cloud computation aimed at enhancing human perception and agency. IoT promises a future where we can extend our vision and reach as far as we want to go, where each human thrives at the center of their own extracortical web of sensors and agents. --- # My Three Laws of IoT * **First Law:** Devices must cooperate for the benefit of humans * **Second Law:** Devices must communicate, and obey instructions * **Third Law:** Devices must be as simple and reliable as possible ??? If we treat human centeredness as an axiom, this leads to my three laws of IoT: * Devices must cooperate for the benefit of humans * Devices must communicate, and obey instructions * Devices must be as simple and reliable as possible If you squint a bit they look like Isaac Asimov's famous three laws of robotics, which he imagined as a minimum constitution that we would need to embed into our technology to prevent it, to put it bluntly, from killing us. And these IoT laws serve the same purpose, which is to prepare the stage for a future where our technology has inherent checks and balances that serve to protect our agency and safety. --- # My Three* Laws of IoT * **Zeroth Law:** Devices must be beautiful (or invisible). * **First Law:** Devices must cooperate for the benefit of humans * **Second Law:** Devices must communicate, and obey instructions * **Third Law:** Devices must be as simple and reliable as possible .footnote[-ish] ??? But Asimov got to retrospectively sneak in in a Zeroth law, so I can too. Devices must be beautiful, or invisible. If we invite computers into our living rooms, bedrooms and bathrooms, and even our bodies, they can't look like truck parts, and they definitely need to be working for our benefit, not for somebody else's profit. Last year LCA featured a keynote from Diane Hosfelt who talked about the struggle to retain control of the insulin pump that keeps her alive. The year before Karen Sandler told us of her unease at the concept of an implanted defibrillator with wireless communications. This is not a theoretical conundrum. --- # What they do in the shadows ## Ethical tech is **your** responsibility ??? But, out of sight, cannot be out of mind. Somebody needs to think about justice and privacy and safety and all the big lies that we believe in to make society different from a herd. And as Lana said on Tuesday morning, until and unless we make ethics and safety a software project role in their own right, Somebody is You. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality ] --- class: center, middle template: inverse # PART II: The Reality ??? Which brings us crashing to earth in the unglamorous present. We have, as has become distressingly apparent, collectively built an internet that feeds on attention and sells it, along with our behaviour and habits to whoever will pay. And it's big business. As the saying goes, if you're not paying for the service, you're not the customer, you're the product. --- # Advertising is a parasite of attention .more[[CNBC](https://www.cnbc.com/2019/09/11/google-antitrust-investigation-to-focus-on-advertising-business.html)] ??? Two of the four biggest technology companies in the world are now essentially the hollowed out corpses of their former selves animated and controlled by the parastic wasp of advertising. And I'm not sure that the other two aren't. I don't know how we get out of the trap we have wandered into where both our mass media and our interpersonal communication tools are honeypots that feed on our privacy, but I have baked you a cake today that contains a file and a few other tools that might let us tunnel out of here. Bear with me for a moment, because before we cut the cake I want to frighten you some more. --- # Big Data ## It rhymes with "Big Oil" and "Big Pharma" .more[[The Balance](https://www.thebalanceeveryday.com/the-pros-and-cons-of-grocery-store-loyalty-programs-940240), [IT News](https://www.itnews.com.au/news/loyalty-schemes-on-runway-for-data-harvesting-reset-accc-534888)] ??? Do you notice how there's still a whole step at the supermarket checkout, whether human or robot mediated, that badgers you for your loyalty card. They time the toilet breaks of their staff, they aren't spending ten seconds of every checkout pestering you about this so that you can have that flight to bali. What are they doing. Loyalty programs are about collecting data to build a digital double of you, that can be imprisoned and tortured until it gives up the secret of how to make you buy more stuff. --- # It's all about the Nellies ## Your Smart TV pays for itself by spying on you .more[[The Verge](https://www.theverge.com/2019/1/7/18172397)] ??? It's all about the Nellie's. I nearly said the Dougies, but that was the old ones. For the confused I'm talking Australia's hundred dollar bill, which you almost never see in the wild. Cash after all, is so inconvenient to track. And tracking is everywhere. Around this time last year, the CTO of a smart TV company admitted in an interview that if you want to buy a dumb TV without any spyware I mean apps then they're going to have to charge you MORE to offset the loss of all that juicy monetizable data. This is not human centred technology, this is us as the fly in somebody else's web. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare ] --- class: center, middle template: inverse # PART III: The Nightmare ??? We have come a long way down the path of being at the mercy of technology, and some of the technology out there is frighteningly merciless. But there are signs things are getting worse, not better. --- # Inverse Pokémon Go ## "You have insufficient social credit to ride this tram" .more[[BBC](https://www.bbc.com/news/world-asia-china-34592186)] ??? The Chinese government have created a system where your entitlement to government services, and even the right to travel is determined by your behaviour as observed by a pervasive grid of internet and physical surveillance. Here in Australia the expanding use of data matching by our own government leaves little doubt that their ambitions lie in the same direction. I fear that we won't be able to rely on our bureaucracy's hilarious and total inability to computer as a measure of protection forever. --- # The plural of datum is demon ## Robodebts, and Robocops ??? We've had robodebt which yanks the social security safetynet out from under the disadvantaged based on less than reliable inferences about our honesty. We've seen in the last year how phone and travel records, and IMSI catchers can be used to establish who was at political demonstrations, in Hong Kong, in the USA, coming soon to a city near you. --- # Would you like fries with that? ## Disclaimer: we'll tell your insurer .more[[The Economist](http://www.economist.com/node/21556263), [Forbes](https://www.forbes.com/sites/kashmirhill/2012/06/15/data-mining-ceo-says-he-pays-for-burgers-in-cash-to-avoid-junk-food-purchases-being-tracked/#43918ffa1d9e)] ??? And then there's the private sector. If supermarkets think they know everything you buy, you bet they're thinking monetising that data by selling it your insurer. And if you buy all your fruit and veg at the farmers market and only the junk food at woolies, what is the your health insurer going infer about your medical risks? If you develop diabeties later in life, how would you feel if your health insurance company pulled up the records from your supermarket loyalty card, estimated your lifetime junk food intake, and used this to deny your coverage. If you have an automobile crash do your really want your insurance company second guessing whether you ought to have avoided it based on their post facto reading of vehicle sensor data. --- .fig60[ ] .spacedown[ # The fitbit snitch ] .more[[Washington Post](https://www.washingtonpost.com/world/the-us-military-reviews-its-rules-as-new-details-of-us-soldiers-and-bases-emerge/2018/01/29/6310d518-050f-11e8-aa61-f3391373867e_story.html?utm_term=.ff184fec6e6d)] ??? Here's a story from the middle east early last year. This is the outline of a military base in Iraq, showing the fence lines and patrol routes and convoy corridors, assembled from the aggregate fitbit data of US troops stationed in Baghdad. Since almost no locals in countries that have been bombed to a dusty crater by the USA have fitbits, it's easy to identify ones worn by soldiers. What's worse is the US military gave thousands of fitbits out to their own chunky soldiers, without realising they were giving away more than they intended. --- # Neighbourhood Watch II, The Creepening .more[[Washington Post](https://www.washingtonpost.com/technology/2019/11/19/police-can-keep-ring-camera-video-forever-share-with-whomever-theyd-like-company-tells-senator/)] ??? But one particularly concerning story emerged last august, and just kept getting worse. After a series of reports, rumours and denials last year, it was confirmed in november that hundreds of Police departments in the united states have an agreement that grants them warrantless access to recordings from a certain large corporation's video doorbells. But only in 12 hour chunks. And only if there's been a crime. Any time in the last 45 days. Anywhere within half a mile. And that hardly ever happens, right? --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective ] --- class: center, middle template: inverse # INTERMISSION: Perspective ??? With stories like this, you might be considering throwing all your devices into a barrel and burning the lot. Or just lay them out in your backyard and let global warming take care of it. But I don't think many of us truly want to put the genie back in the bottle. Do you remember the yellow pages? Have you used it lately? Besides as a monitor stand, I mean. --- .fig35[ ] .spacedown[ # The unsmart past ## Would you really want to live there? ] .more[[huffpost.com/entry/radio-shack-ad_b_4612973](https://www.huffpost.com/entry/radio-shack-ad_b_4612973)] ??? Sometimes you get these moments when its undeniable that we really are living in the future. We're now beyond the years where science fiction classics like bladerunner, snow crash, and Akira were set. And our world today looks a lot more like those futures than it resembles the dark ages up there. This is a radio shack advertisement from 1985, and almost every single piece of technology in that advert is extinct now, and what's more, EVERYTHING on that page is now subsubmed by your smartphone. Its a really hard argument to say that our lives are worse today than in 1985. --- .fig70[ ] .spacedown[ #Anyway, Have a Quokka] ??? Anyway I am done talking about how horrible everything is and we're now going to look at what, as a person you can do to protect your informational integrity, and what we call can do as creators to build systems that protect people's privacy rather than strip mining it for cash. So lets take a deep breath, absorb some quokka energy and take Inspiration from Donna's Keynote on owning our power to leave the world better than we found it. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective # Ownership ] --- template: inverse # PART IV: A New Hope ### See what I did, there? ??? A long time a go in a valley far far away a bunch of nerds took the idea of the world wide web and built a commercial enterprise boom on it. And then that blew up, but after a while they built a second one. The second dot com boom was the social media revolution. Some heroes of the first boom turned evil, and some new heroes arose, not a few of whom turned out to be villains also. Dot com 2.0 promised to sort our email, and keep us in touch with our school friends, and let everyone know what you had for breakfast, or what funny thing your pet just did. What we got was targeted advertising and space nazis. --- # The Empire takes control ??? Later, those same companies took the pots of cash that they made from monetising our eyeballs and branched out to hardware. Or much of the time, they gobbled up a number of smaller companies that didn't run fast enough. Siri, Alexa, Nest and Ring all started off as plucky little independent players, assimilated by the galactic empires. And so the internet of things, at least in its consumer manifestation, was strangled at birth by the advertising industry, replaced by hollowed out corporate changelings, and as Kathy Reid highlighted yesterday, data about our private lives and bodies is frequently being gathered and stored and warehoused, and treated as a corporate asset, no longer our property. --- .fig80[ ] .spacedown[ # Whoops ## Have another quokka ] ??? Whoops, There I go preaching fear and paranoia again. I promised I was done doing that. Have another quokka. --- template: inverse # Keeping ownership ## Take thy FAANGs from out my heart ??? All right, so how do we maintain our digital integrity in this environment. What sort of approaches might we take to turn this around. Can we wrest control back from the five eyes of silicon valley, or do we just burn it down and start afresh? To be honest I'm going to talk more about the latter from here on. --- # Don't collect data ## Turn off, and drop out ??? Obviously, one thing you could do to safeguard your online privacy is to check out of the whole rodeo. Abandon the benefits of the internet, along with the problems. Move to a cabin so far back in the woods that you need to bring daylight in by pack mule. That feels like defeat, and while Dr Brady's keynote from tuesday reminds us that there is a time to drop your phone in the toilet and go to that protest carrying nothing more technical than a zippered hoodie, I'm not ready to give up fighting for digital ethics. --- # Don't upload data ## You can't lose control of data you don't share ??? You can't lose control of data you don't share. So the second thing we can consider is to change what data we share or collect. If your doorbell camera happens to also overlook the neighbour's pool, you can mask out that area of the frame in camera, there are totally cameras out there that already let you do this. Our investigations show that it's unlikely that everything you say in front of your home telescreen, I mean voice assistant, is being transmitted to the thought police, but you will observe that the wake word recognition is deliberately fuzzy -- any collection of syllables that have a similar shape to a configured wake word may cause your voice assistant to start listening and transmitting. --- # Do more at the edge ## AI: It's not just for the server any more ??? A special case of not uploading data is to process data in the field instead, and send less raw data. There's a phase shift happening in IoT at the moment. Use of AI and computer vision was something that had to happen at the back end, but this is changing. Lightweight machine learning tools that can run in the field are here, and the processing power of sensor devices is going off the chart. The Open Source FPGA folks are doing amazing work in making the deep voodoo of field programmable gate arrays more accessible; these technologies open up exciting possibilities to lessen our reliance on cloud service. If you see Tim Ansell or Sean Cross around the conference, ask them about FOMU. If you don't I guarantee you'll have FOMU FOMO. --- # Protect data in transit ## Let's, to borrow a phrase, Encrypt. ??? If you're choosing or building IoT solutions, consider how the data gets from the device to the cloud. I've seen video distribution systems that can be viewed and even reconfigured by anyone with network access, and I've encountered one business who is using IoT gateways that connect high voltage equipment to the internet with exactly zero security. I want to give a positive shout out here to Amazon and Apple who have both essentially outlawed unencrypted connections in their ecosystems. This is absolutely the right thing to do. It's twenty freaking twenty people encrypt your bloody connections. Here's a dollar, kid, get yourself a real microprocessor. If you won't do it because I will come and shout at you, do it because a number of countries are enacting product safety laws that force you to not be lazy and incompetent. --- .fig40[ ] .spacedown[ # Don't store data ## DELETE DELETE DELETE ] ??? Come to think of it, why the hell does amazon or google need to have a record of everything I've ever searched for, or spoken in the presence of my voice assistant. Maybe there's a reason, but I really doubt its good enough. As a business every piece of personal data you store is a liability. As an individual every piece of your personal data you let someone else store is something you might see on a front page some day. The cyber insurance industry has a role to play here, much like how the payment card industry has worked to eradicate lazy practices. --- # Encrypt data at rest ## No, the work experience kid does **not** need to access voice recordings ??? Most of the biggest privacy leaks of all time have been due to poorly secured databases or file stores. It should take more than one mistake to lose control of private data. Don't leave files around where anyone can read them. Don't put your sql database on the webserver. Cmon people pretend just for a second that privacy is as important to your business as money and GET SERIOUS. --- # Encrypt data *at all times* ## "You do not have Need To Know" ??? I even wonder why some cloud services need the key to our data at all. What does Amazon do that requires looking at your doorbell camera footage. It's on their servers so that you can access it, but *they* shouldn't need to. This approach of not even holding the key for a data vault works for password apps, why not everything else. Fitbit stores the history of your exercise so that you can gloatingly share them with your friends, or whatever it is that people who run places do with that information, but fitbit themselves shouldn't need the details. Your browser or phone is drawing the route by overlaying vector data on to map tiles. The service doesn't need to see either. --- # Facebook 2.0: Pre-Encrypt data for all potential recipients ## Thats not you, Zuck ??? I can even envisage a replacement of facebook where the entire content of updates is opaque to the service. I'm posting a 30 second cat video for my 300 friends, the space for 300 encryption keys is a rounding error. --- class: case # Case Study: Pretty Good Privacy (PGP) ## Disclaimer: great idea, "sub-optimal" UX ??? Calm down, I'm not seriously suggesting you ask your inlaws to PGP encypt their facebook updates. Fraser can do that. PGP is an email encryption program that was invented in 1991. It was so revolutionary at the time that the US government spent several years attempting to charge the author as an arms trafficker for letting his software leave the country. Thanks to its famously unfriendly user experience, it didn't exactly take off. In almost 30 years of maintaining a PGP key I've been asked for it in a business context exactly once. But this program does exactly the kind of thing with your private messages that most cloud services ought to be doing. --- ## Multi-party public-key encryption ??? Let me briefly explain public key encryption. This is also called asymmetrical encryption. Instead of having one password you have two. One, the public key, which is like your crypto nerd phone number, can encrypt a message, such that only the private key, which you keep on a string around your neck or inside your cybernetic implant, can decrypt it. If I want to post a picture of the truly righteous eggs benedict that I had for breakfast, I generate a random password and encrypt my breakfast selfie once with an ordinary symmetrical encryption mechanism. Then I get the public key, that crypto nerd phone number, of everyone I want to be able to see my selfie, and encrypt my random password once for each recipient, using their individual public key. Someone who's allowed to see my selfie can use their private key to unlock their copy of the selfie password. Oh no, you say, that's too complicated. Shut up, it's the future. Nerd have got this. Your web browser already uses this technology every time it loads a web page. We've had the technical capability to do this since I was six years old. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective # Ownership # Safety ] --- template: inverse # Safety at Rest ## Decentralise the cloud ??? Why do we even give our data to cloud services. We put our money in banks becuase they're insured. If they get robbed, we get our money back. But if a cloud service sells our data to anyone who can pay, or forgets to set a password on their mongodb, we don't get our privacy back. --- # Go into the privacy business yourself ## Maintain your own cloud vault ??? So lets consider some alternatives to centralising data storage. There's a story that Robert Heinlein used as a metaphor in one of his novels. There's this old man who's employed to polish the brass cannon out front of the town's courthouse. Someone asks him about his retirement plans. "I've saved a little money," he says. "I'm going to buy a cannon and go into business for myself." Running your own cloud data storage might sound a bit like that. Is it really any different to having someone else store your data. Well, if others only have at-need access to your data, it's harder for them to resell it, to lose it, to stockpile it. And of course you encrypt it at rest with multiparty encryption, so if you've agreed to share something with Bob and Alice but not Carol, then there's no loss if Carol hacks your server. --- # Your own on-premises vault ??? It doesn't even need to be in the cloud. Maybe what you want is a safe full of memory chips with an antenna. Right now, loading a webpage already involves hundreds of http requests from dozens of servers in order to load up all the tracking cookies and advertising crap we adorn our websites with. So I don't think collating data from the individual vaults of dozens of owners in order to assemble some result is a different scale of problem. --- # Your phone is your vault ??? Heck your phone could even be your personal data vault. I did a back of the envelope calculation, and the text of my entire twitter output going back a decade is about the size of one song. --- # Reel your data back in (GDPR and CCPA) .more[[FastCompany](https://www.fastcompany.com/90178906/5-key-things-in-californias-new-privacy-law)] ??? What if the cat's already out of the bag, you gave your personal data to a bunch of scumbags and you're not happy with what they're up to. Well, if you're european, you have GDPR, the EU General Data Protection Regulation on your side. You can demand that a company supply what information they have on you, and or delete it. There's similar regulations in the pipeline elsewhere, California's equivalent went live this month. It's not quite as powerful, as it doesn't extend to resold data and it won't give you the right to demand erasure of public interest data such as the arrest record from that time you got arrested for trying to borrow a quokka for the weekend from the zoo. --- class: case # Case Study: The Hub of all things (HAT) .more[[hat-lab.org](https://www.hat-lab.org/programs/2018/8/16/hat-hub-of-all-things),[dataswift.io](https://dataswift.io/about), [Ethical Tech Alliance](https://ethicaltechalliance.org)] ??? Now, I'm not making this stuff up about personal data vaults. This stuff already exists. In the middle of last decade a UK consortium of 6 universities researched what they called a Multi Sided Market Platform, with an implementation called the Hub of All Things, HAT for short. This effort was spun off into a commercial enterprise now known as dataswift dot io. Last year they sponsored the creation of the Ethical Tech Alliance which is a network of companies and developers that promote, well, ethical tech. No connection with these organisations by the way, I just like their ambition. --- template: toply # Ethical tech alliance * Ethical and responsible technology ??? What does ethical tech mean? Well it means technology that takes as Lana urged on Tuesday and Marissa spoke about just now, responsibility for the consequences, intended or otherwise, of its use. -- * Legal compliance with GDPR etc. ??? Technology that recognises users privacy rights as enshrined in law. -- * User representation ??? Technology that represents the needs and desires of users, not just corporations. -- * Privacy-by-design ??? Technology that preserves privacy by design. -- * Gender, ethnic, and socio-economic representation ??? Technology that ethically handles identity, and is inclusive and empowering. -- * Environmental sustainability ??? And technology that champions environmental sustainability. Not technology that bricks itself on purpose when you try to recycle it. Looking at you Sonos. --- # Instantiate your own data hub (a HAT) ## on the cloud, or your phone ??? Here's how it works. A hat is your personal data vault. You get to decide what services have access to what data. You choose someone you trust to host your hat, or host it on your personal device. --- # Install plugs which bring data into your hat ??? You can choose to install plugs which will deposit data into your hat. --- # Websites call the hub api when they want your data ??? In the other direction you grant apps access to data within a certain namespace. If you're familiar with the way that permissions have evolved in either android or iOs this will be familiar, its the same general process of apps asking for permissions when they need it and you choosing to grant or not, and having the option to revoke later. --- # Tools provide insight on your data ??? The final piece of this hat ecosystem are tools. Packaged applications or machine learning models that pull in data from your hat, do some computation and deposit a result. The important thing here is that computation happens inside your trusted environment, not somebody elses. --- # Popularity: the chicken and the egg ??? The challenge with an ecosystem like this is bootstrapping, it wont get popular if nobody implements plugs and apis, and nobody will do that for a service that isn't popular. The dataswift folks are addressing this via the ethical tech alliance, the platform provides a way for startups to bootstrap their data ecosystem with less effort, not more. They have implemented plugs for some of the big services like facebook and twitter, which can pull your posts and tweets into your hat, where you can share them with other services, but its not clear how anyone would convince a big service to implement the inbound half of the picture. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective # Ownership # Safety # Inversion ] --- template: inverse # Invert the power relationship ??? Next, lets go deeper into the potential of inverting the power relationship via trusted computing environments. Currently data sharing is a leap of faith. If I share my personal data with some service, I lose control over what happens next, whether that data gets used for what I expect, or someting else, or whether it gets kept secure or flogged off to a crime syndicate. --- # You want to license your data ??? We're going to move away from social media examples and look at iot data. You've collected some data, which you use for your own purposes. But it has the potential to be valuable to others, if you could share it ethically. This might be agricultural data, or driver behaviour, or what have you. --- # I want to use your data ??? There are other entities that want to do things with your data. Maybe insurance companies want to measure your fire or accident risk, or drug companies want to measure efficacy from biomonitor data. --- # You don't want me to see your data ??? Naturally, you're a little bit creeped out by handing over intimate details of your business operations, your body, or your behaviour. --- # Impasse? ## Not necessarily ??? So how do we get past this --- # Buyers want answers, not data. ??? An insurance company might just want to know your percentile rank for certain driving characteristics, to adjust your premium. A drug company might simply want to know if there's a correlation between time of day and some measurable symptom. The point is, there is often a simple aggregate value that is desired, once the code is written and the calculation is done, nobody actually needs to look at the raw data. --- # Proctored Computation ## "If the mountain will not come to Muhammad, Muhammad must go to the mountain" ??? So lets turn the problem around. Rather than one party supplying the data to the party who wants to run the code, the code comes to the data. Or if there are multiple data suppliers, it all goes to a trusted third party. The algorithm runs, and the answer (which of course can't be "print out all the data") goes back to the party that wants it. Yes its kind of like java applets all over again. --- class: case # Case Study: The SAM project ## And strawberry fields (forever) ??? Now this is an example where I do have an interest. The sam project have been operating an ecosystem like this since 2013, and they've been my client for just under a year. Late last year we built the platform I'm about to describe, and this year we're building a mobile app to put in the hands of users. --- # I grow strawberries ??? This is an example that came to me from a company who makes software for running fruit farms. It tracks the expenses and revenue of each field, and lets farmers model their income across seasons and locations and varieties. The idea came from a farmer who funded development and then the product was spun off to be its own venture. The goal of a berry farm is to plant the right varieties at the right time to guarantee steady supply across the harvesting season, and keep labour costs under control. --- # You grow strawberries ??? It would be great of course, if a farmer could have access to information beyond their own fences, maybe there's a neighbour growing some wonder variety, or hit on some winning planting schedule. --- # Could be .red[love] ## But it's .brown[not] ??? Yeah right, you first. Farmers know their data has value, and they're sick of being screwed over by industry, so they have a natural reluctance to let their data out of their hands. --- # Am I growing the best variety? ## How does my profit compare to my neighbours? ??? It's not really a competitive thing, the best outcome for a berry industry is that supply is nice and even and there are no gluts. So how can we facilitate this kind of data sharing. --- # Let's do the computation without any data sharing ## (We'll use Sam as a trusted third party) ??? As I said, for the past few months we've been working with an aussie project named Sam to make this possible. Their Sam Ecosystem is an open platform for data sharing and cognition. Sam enables data producers to balance their right to privacy with the opportunity to use and monetise their data. Sam is also a way for data consumers to negotiate access to the data they desire. Finally, Sam is also a platform for data analysts to monetise use of their algorithms and AI models. --- # Start with a schema ## Describes a data type (field syntax and semantics) .more[[json-schema.org](https://json-schema.org)] ??? We start by defining a data schema. This simply describes the format and semantics of the data. We chose the json schema format because, well, it's not xml. --- # Create a silo ## Cloud or personal ??? For each schema you're going to use, you create a silo. This is the same concept as a HAT, a place that you control where you put your data. Sam operates a big elastic search cluster for hosted silos, each silo being a separate segregated index, or you can host your own silo. --- # Bring in data ## Batch or streaming ??? You bring data into your silo either by API or via message bus, we use MQTT because, well we use it for everything, it's the garden hose of IoT. --- # Buyers solicit data ## Schema + Filter (query) ??? Now, somebody wants to do a computation. Perhaps the straberry growers alliance is going to produce a variety performance report. They choose the schema representing agricultural yield, and a query that selects strawberries within south east queensland. The Sam system runs the query and says it can supply 150,000 records fro 600 silos. --- # Buyers select or supply a computation ## computations specify output schema ??? Now the buyer specifies a computation, in the form of a python script and a json schema. The schema says the script is allowed to output a percentile table. --- # Buyers propose transaction ## (dataset, computation, price) ??? The combination of the query, the proposed algorithm and the price on offer are like a buy offer on a stock market. The sell prices from the data owners will result in either a transaction, or they won't. You can leave an offer out there hoping the sellers crack, or revise your offer. --- # Sellers agree (or not) ??? When a transaction happens sam generates what we call a data carnet. In the dirt ages this was a legal document allowing importation of goods. In the data age, its a digital certificate containing the query, the algorithm, and the addresses of the silos. It embodies the agreement to share data, and like other digital certificates it has a period of validity. --- # Data is moved to a computation enclave ## (or maybe streamed) ??? To do the computation I present my carnet to a computation enclave. This is an apache spark cluster hosted on AWS, either existing or newly spun up for the purpose. The enclave uses the carnet to obtain access to the silos and pulls in the data it needs. I've glossed over some details, its possible the transaction may be more of the nature of a futures contract, where data that does not yet exist will eventually get delivered via an apache kafka stream. --- # Computation occurs ## Result type checked .footnote[Yes, you with your hand up, I know you know Gödel's Theorem] ??? However we get the data, the computation runs, output is checked against the schema, and yes I know that's provably impossible to be 100% certain there isnt some side channel leaking data, but that's a calculated risk inherent in the system. --- # Result provided ## Nobody (besides the enclave operator) saw the complete data ??? At the end of the day, the data owners get payed, the buyer gets their answer, the enclave gets destroyed, and everybody is happy. Provided the enclave operator isn't evil of course. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective # Ownership # Safety # Inversion # Spooky ] --- template: inverse # Spooky Computation at a Distance ## Can we dispense with trust?* .footnote[No Blockchains, I promise] ??? Now, we get to the freaky part. Is there a way to do something like the exercise I just described, without a trusted third party. Or without even copying any data. Well, this is an active area of research, and there's two techniques that have gone from theoretical to practical in the last 10 years. --- # Spooky #1: Private Set Intersection ??? The first of these is called private set intersection. It's the equivalent of doing a database join where you have one table and I have another table, and neither of us learn about the rows that aren't in the join. --- # How do I try it ## Google private compute .more[[github.com/Google/private-join-and-compute](https://github.com/Google/private-join-and-compute)] ??? There's a C++ library from google called private join and compute which implements this protocol. If I have a table of names and values, and you have the same, we can compute the intersecting rows and the sums of the values, without revealing the underlying data. Now this is not perfect, it's possible for a malicious actor to learn some things about the other party's data by suppling pathological data from their side, but two relatively honest parties can cooperate. --- class: case # Case study: Insurance fraud ??? So how my we use this. This is another real world example I heard about from someone who implemented it. --- # You (an insurance company) have claims ??? You're a motor insurance company, you have lots of insured vehicles and a number of claims. --- # I (another insurance company) have claims ??? I'm another motor insurance company, and I have some claims too. We'd both like to know if we're being ripped off by any chancers who've taken out multiple policies on the same vehicle. But it's actually against the law for us to reveal claims data. --- # Do we each have a claim on the same property? ??? So how can we determine if there's any duplicate claim. Well this is one of the simpler problems that private set intersection solves, we choose an identifier, probably the vin number, and our associated value can just be the value 1. We run the protocol, and if our result set is nonempty, those are the vehicles with duplicate claims. You're busted. --- # Spooky #2: Homomorphic encryption ??? Now if you think that's cool, you'll love this. What if we could do more things to data without seeing the data. What if we could do mathematical operations on it. Homomorphic encryption schemes let you encrypt some data, and give it to another party who performs arithmetic operations on the ciphertext. When you decrypt the result you get the answer as if the operation had been performed on the plaintext. --- # What is Homomorphic Encryption? .more[[HomomorphicEncryption.org](https://homomorphicencryption.org)] ??? People have been trying to do this for 40 years or more, and there's quite a bit of overlap in both researches and techniques with public key crypto. You can think of homomorphic algorithms as an evolulution of public key crypto that includes error correction. A plaintext value is going to encrypt to a much larger ciphertext value that contains allowance for noise. Arithmetic operations introduce noise into the ciphertext, but as long as you don't add too much, you can still recover the plaintext. The cryptographic algorithms under the hood tend to rely on a problem known as ring learning with errors, and while I know what each of the words in that name means, I'll be honest and tell you that my eyes glazed over after the source papers ran out of greek letters and started using ancient dothraki. --- # How do I try it? ## Microsoft SEAL ## Data 61 python-paillier .more[[Microsoft](https://github.com/microsoft/SEAL), [data61](https://github.com/data61/python-paillier/blob/master/docs/usage.rst)] ??? Anyway the good news is that you don't really have to know how it works, but you do have to know that it isn't unlimited magic. You can only do certain operations, you can typically do a fair number of additions and a smaller number of multiplications. If you do too many operations you exhaust the noise budget and the magic smoke escapes. So you have to estimate in advance how much computation you might want to do. You can't branch or compare. Figuring out how to do useful work within these constraints is an active area of research. So far we have protocols to compute means, and even standard deviations. Microsoft have a release called SEAL, simple encrypted arithmetic library, which is written in C++. And CSIRO data 61 have a library that uses a less powerful algorithm, but is written in python, which is handy if you're doing this on some of the cloud compute platforms. --- class: case # Case study: strictly controlled medical data ## Lets call it 'strawberryitis' ??? Now there's no free lunch, so these algorithms tend to be slow and produce large outputs. So you use them sparingly. But there are cases, often in the health arena, where the constraints on data handling are very strict, and options are limited. --- ## You (a collection of hospitals) each have patient data ### By law, your data cannot be disclosed or stored in cloud systems ### (even encrypted) ??? Lets say there's a deadly condition known as strawberryitis. A number of local hospitals have seen patients with this disease. --- # I (a researcher) want to study incidence of some condition ??? We want to know the total incidence of this condition and the distribution of recovery periods. How can we compute this. --- template: toply # Homomorphic encryption to the rescue .more[[Applications, Archer et.al](http://homomorphicencryption.org/white_papers/applications_homomorphic_encryption_white_paper.pdf)] * I create an encrypted data record (sum and count, possibly more) ??? Well, say I take two integers, which happen to be zeroes, and encrypt them, to produce two chunks of gibberish. -- * I pass record to hospital one, who adds their aggregate data ??? I send these records off the first hospital who add their patient count and recovery time values. -- * H1 passes record to H2, who adds theirs ??? The data goes to the second hospital who adds their data. Now it's important that it doesn't come back to me in between, because I could decrypt the document after each operation and work out the values that each hospital added. -- * H2 passes to H3, and so on ??? So the document passes around through all the parties -- * Hn returns record to me ??? Until it comes back to me where I can decrypt the accumulators and work out the patient type and the mean recovery time. --- # Homomorphic encryption protocols let a computation occur where no party can learn the contribution of any other party ### Yet the originating party can decrypt the aggregate result ??? If we set up the data and the cryptosystem properly, and follow the protocol, we can perform a computation where no party can learn the contribution of any other party. I reckon thats pretty freaky. --- # We are not done ## When the going gets weird, the weird turn pro. ??? We are nearly done. But we are not completely done. There is one more thing I want to tell you about. --- # Cryptonets .more[[github.com/microsoft/CryptoNets](https://github.com/microsoft/CryptoNets)] ??? Not only can you do simple calculations on encrypted data, you can run neural net processing on encrypted data. Again from microsoft we have a reference implementation of a library which lets you encrypt your data then send it out for cloud processing by a neural net, which only ever sees the encrypted version. --- # Your algorithm, my data. ??? So what might we do with that. Well, after we have chipped away in this talk at all the reasons cloud services might give us to justify having unrestricted access to our data, one of the last justifications they have is "but but our magic algorithm helps us show you the things in which you are most interested". Well firstly, you can stick your curated timeline where the sun don't shine, but secondly, I have some hope that if cryptonets get powerful enough then they can solve this too. --- # Tensorflow on encrypted data .more[[tf-encrypted.io](https://tf-encrypted.io)] ??? If you're using tensor flow, then the microsoft SEAL team are working hard on incorporating homomorphic encryption so that you can train your model on encrypted data and operate it in the same way. I hope you can appreciate this is really revolutionary. --- # Encrypted models that come to you .more[[Alex Ovechko](https://medium.com/@ovechko.056/tensorflow-pretrained-model-encryption-decryption-in-mobile-apps-e3e95209716a)] ??? Finally we have the possiblity to encrypt the model itself. Tensorflow models are designed to be serialisable as google protobuffers so that apps can pull them down. Theres a link down there about doing that, but work is also happening on incorporating homomorphic encryption into model handling so that the model can be kept secret for commercial reasons. I don't have time to go into the details which is totally a lie because I just don't understand them yet, but this an area i'll be watching with interest. --- layout: true template: callout .crumb[ # Welcome # Dream # Reality # Nightmare # Perspective # Ownership # Safety # Inversion # Spooky # Coda ] --- .fig30[  ] # Recap .nolm[ * Big Data === Big Pharma * You are not a Thing. * Ownership at Source * Safety at Rest * Invert the power relationship * Spooky computation at a distance ] ??? Okay, we've come to the end, and what I hope I've demonstrated, in the spirit of this week, is that there is actually quite a lot of scope for improvements in the way we handle private data. There may be little hope of getting some of the big incumbents to change their ways, but there is hope of a new breed of privacy preserving enterprises. We've looked at ways data owners can retain control of their data. We've talked about keeping tighter control of who has access to data. Proctored computation allows parties who might not trust each other to utilise a trusted third party, and finally the exciting and emerging fields of homomorphic encryption allow for multiparty computation to occur without requiring a trust anchor. --- # Thank You, I'm Here All Week ### (Try the BoFs) ## Related talks - [http://christopher.biggs.id.au/#talks](http://christopher.biggs.id.au/#talks) .nolm[ - Twitter: .blue[@unixbigot] of .blue[@accelerando_au] - Email: .blue[christopher@accelerando.com.au] - Accelerando Consulting - Internet of People-are-not-Things - https://accelerando.com.au/ ] ??? Thank you very much for your attention, and I hope I have helped to make you think about privacy in a different way. These slides will be available on my website, and I'd be delighted to take your questions now or later.