heading · body

Transcript

Mike Stonebraker On Database As Operating System Datacamp

read summary →

TITLE: #201 The Database is the Operating System | Mike Stonebraker, CTO & Co-Founder At DBOS CHANNEL: DataCamp DATE: 2024-04-25 ---TRANSCRIPT--- so one of the environments that we have deboss running on is the MIT supercloud which is in holy o Massachusetts it has 32,000 processors it has several terabytes that’s for the T of of main memory and many terabytes of secondary storage so the resources that the operating system has to manage has gone up by about six orders of magnitude in the last 40ish years so without me saying another word that makes man man in operating system state which is keeping track of tasks resources processes files all that stuff that makes keeping track of operating system state that makes it a database problem without me saying another [Music] word hi Mike thank you for joining me on the show oh thanks for having me Richie wonderful so your most famous Vi work on the post SQL database so can you tell me what you think led to its phenomenal success so so first of all its phenomenal success had almost nothing to do with me so so it was it was picked up in 1995 by a pickup team of programmers who have promoted it shepher it ever since and so they they deserve most of the credit uh and you say why why is it Tak taking over the world uh well it’s a much better database system than my SQL and hopefully you know cream Rises to the top uh but also uh people were kind of afraid when Oracle bought bought my Sequel and that postgress has remained purely Community Driven all this time and I think I think you know it it’s a perfect example of what open source is supposed to be I’m delighted that the the big the bants the uh Cloud elephants are pretty much standardizing on on The postrest Wire protocol and so I think I think it it’s going to be it will become a very very dominant databas system while the interface there will be lots of implementations yeah that that um it is really it’s a community that’s grown and created something amazing and it’s still continu to evolve sort of decades after its introduction so um is there anything that you’re most excited about in the world of SQL databases Andy Pavo and I wrot wrote a wrote a paper well I wrote a paper in in 2007 saying you know what goes around comes around and there aren’t any new data models uh Andy Pavo and I wrote a paper that’s going to appear in Sigma record that’s uh you know 15 years later and here here’s a summary of what the paper says that there aren’t there aren’t any new data models that are going to get fraction in our opinion and all all the interesting IDE are in either Hardware stuff or in new applications uh and I think for example the everyone is moving everything they can to the cloud as quickly as they can and that’s for all kinds of good reasons and so it seems to me the most exciting thing that I see is that uh if you’re doing one of these Cloud migrations you have a once in a generation opportunity to try and fix sins of your predecessor uh and so you can either do a lift and shift at which point your successor will inherit the sins of your predecessor uh or you can refactor rewrite intelligently move to the cloud and I think that’s that’s the most exciting macro Trend uh other things are uh when you move to the cloud the cloud basically forces you to have disaggregated storage and that’s forcing all the database vendors to completely rewrite their stuff uh and the reason for that is that networking has gotten a will a lot faster uh than it used to be and that enables you to do disaggregated storage uh the second thing is the the big cloud vendors make it financially very attractive to have software as a service function as a service serverless Computing and that will encourage all application writers to rewrite their stuff also will encourage uh database systems to adopt that model so there’s lots of changes driven by the cloud guys in terms of new applications I think uh uh how how is machine learning uh large language models going to be supported by database systems I mean what what where’s that going to go I think it’s the exciting thing to watch uh I think genomic data vases are going to become much more prevalent and how are those going to be supported so I think uh and then I think uh the topic that we’re supposed to talk about is you know can can the oper system really become a database system that’s basically a new application area for databases so I think new application areas are are fascinating and I think the cloud Hardware changes are really fascinating I’m not expecting anything to happen in in data models that that uh I think the relational model is the answer and I don’t see that changing okay um lots to imp back that I like you first point about how um yours inherit uh the S predecessor like I think this is familiar to anyone who work data that there’s some sort of horrendous uh blob there that you’re like well yeah I’m not sure I want to touch that so um the idea that you’ve got to maintain things and uh improve them as as you go along uh avoiding the technical death that seems very important um and then on your last point about uh databases in the operating system it does seem like databases are pretty much everywhere and the operating system is one of the sort of last hold outs um so since your new project is around creating an operating system with a database at its core can you tell me why do operating systems need databases so I I have lots of gray hair so I was a very early user of Unix in 1974 on a PDP 1140 uh One processor uh 48k of main memory not not M or g k uh and 20 megabytes of secondary storage so one of the environments that we have debos running on is the MIT supercloud uh which is in holy o Massachusetts uh it has 32,000 processors uh it has several terabytes that’s with a t of main memory and many terabytes of secondary storage so the resources that the operating system has to manage has gone up by about six orders of magnitude uh in the last 40ish years so without me saying another word that makes uh managing operating system state which is keeping track of of tasks resources processes files all that stuff that makes keeping operating Sy track of operating system state that makes it a database problem without me saying another word and so you want to apply database technology to what’s an obvious obviously a database problem and so that’s number one uh number two Linux is now 40ish years old Unix is 50ish years old uh and that makes them Legacy software uh that’s been maintained patched extended over a long period of time uh and the Linux Community is having a very hard time making forward progress so for example there is no multi multi node version of Linux and so everybody who is running a multi-node system which includes most everybody uh has to run has to run an or a multi-node orchestrator something like kuber edes uh also Linux is well known to be a leaky security boat so people layer all kinds of security stuff on top of it uh and so what you have is a patchwork of of operating system software uh that is a man management nightmare uh and is still a leaky security boat uh and so the fact that Linux is Legacy uh means that it’s time to send it to the home for tired software so scale and and Legacy are the reasons to uh you know start with a clean States slate and start a new okay that multi Noe idea in particular sounds interesting so I know like early you mentioned that this is a solv problem with databases is that now because databases are in the CL out they have to work on multiple servers at once and so um I can see I think I can see where this is going in terms of having the operating system have that capability as well but um I am enough to remember that um this has been tried before so maybe 20 years or so ago Microsoft tried to put um a database at the heart of the operating system with winfs and then they abandoned that project so what’s different this time so the project which was called Longhorn in internally uh uh I you know sort of sleuthed around a bunch uh as to what happened when we started building deboss and the general consensus inside Microsoft was that uh Longhorn was a good idea uh nobody disputed the ideas uh and everybody internally said that its problems were bad management and featur creep that that uh you know it got more and more and more ambitious before it ever worked uh and so I think uh I think technically it’s a very good idea and was then uh and it suffered from from internal politics internal management issues and especially feature creep uh because I think uh you know I I I’ve been involved with or watched a whole bunch of startups and the worst thing in the world that any startup can do is engage in feature creep uh you want to get something running and once you get it running then you can worry about extending it uh and so Microsoft didn’t pay attention to that lesson it’s it’s definitely a big problem being tempted to add lots of features because when you you’re just building something from scratch and there’s like many exciting things you want to add to it but I can see how that can become dangerous if you you’ve not shipped anything um so one things you said you were excited about in the world of databases is that there are lots of um application specific databases for different use cases and it feels like the same thing is also true of operating systems there are different operating systems for different use cases so what is your intended use case for debos so I think as as any startup uh the you know we we are well debos started as a academic research project in 2020 jointly with MIT and Stanford and so we had a running system uh that we decided to commercialize and like any startup the Mantra is uh get a product out as quickly as you can and then in the vernacular see if the dogs are going to eat the dog food and if they’re not uh they will tell you instantly why why they don’t like it so so our our goal is to get a product out as quickly as possible and we did that so in a year we we’ve shipped a commercial version of deoss uh and we had the advantage of having uh a person named Michael Coden be part of the research project and also part of the commercial Venture and until recently he was the uh managing partner of the cyber security practice for Boston Consulting Group and so through him we got to talk to lots lots of Enterprise folks uh big and small in a whole bunch of different areas and so here here’s who salute to those conversations uh first of all the three-letter agencies uh you know the the defense def defense industry who is really focused on security uh the second place where people really saluted were in financial services dealing with moving money around uh and so one thing I learned uh from interviewing a large Regional Bank uh here in the Northeast was they they listened they talked with us and they said wow you solve The Once and only once problem and that was the first I’d ever heard of that that but here’s what he actually meant which is if I’m going to move $10 from my account to Rich your account uh then we’re almost certainly in different systems uh and so the way the transaction should work is you debit my account send a message to your account increment your account send a return message and then commit the transaction so this is basically a distributed commit problem and in the banking world most of the systems that this Regional Bank use don’t have XA support or any distributed commit support so the bank was forced to do this themselves and so they figure that somewhere between a third and half of their application logic is dealing with once and only once and it’s brittle problem prone hard to get right distributed commit is not for the faint of heart and so they would love to get rid of all that code and we do it automatically because we run the network system is a datab is in the database the database is in the database and so we we control all pieces of that of that interaction and so we solve The Once and only one’s problem you know automatically and he was very excited about that so fin Services uh worries about about distributed commit also worries about security uh the third place people got very interested was what I’ll call scuff shoe Enterprises you know which are Enterprises that bend metal and build you know do real things and so one particular Enterprise that who I won’t name uh described their current security system uh they have 100,000 endpoints they’re very big conglomerate uh and so uh they have one particular security vendor who who does ex you know event extraction off of 100,000 endpoints that’s a Sev digigit a year subscription in US Dollars uh and they then uh send those events through a proprietary you know through a you know their Enterprises a a you know a custom workflow system that enriches the events with with uh Enterprise specific data they then pass pass these enriched events to another security product another seven digit year subscription and uh they and the vendor have written several hundred monitoring rules uh and so uh in production you know if a monitoring fires uh they have a human analyst look at it to make sure it’s not a false positive and if it’s not a false positive then they take action the time elapse time between when uh you know a bad guy knocks at the door and they and them taking action is measured in in multiple hours and they are just terrified of ransomware attacks and so they they have known many of their peers have succumbed to ransomware attacks it tends to take all production down for multiple days at a time and costs a billion dollars if you’re a you know a big Enterprise so they are terrified about security and the thing they love about deboss is deboss recovers automatically from ransomware attacks uh and the reason we can do that is we have everything in the database everything all the operating systems state is in the database and we keep a log of that state historically we spool it into a data warehouse and so if you want to back up the operating system 18 minutes you just do it uh and if applications put all if this is fast enough for the operating system it’s certainly fast enough for your application so if your application puts all state in the database then when we back up 18 minutes we back up everything 18 minutes and so if you had an ransomware attack 17 minutes ago you just back up 18 minutes single step around the bad guy and let the system go so they’re very excited about our you know our security story their problem is that they are dragging a huge Legacy code base around and so to take advantage of deoss you would have to restructure refactor rewrite a bunch of it and that’s that’s a project for the decades and so that will be a very very slow Market on the uptake so with that said what what we’re aiming aing uh for the initial commercial deboss users are the three-letter agencies uh Financial Services Especial the startups uh and uh adventuresome Enterprises who are willing to refactor stuff uh on the way to the cloud so that’s that that was a long answer to your short question yeah so I have to say I was thinking you swap out your operating system is probably going to result in a fairly small productivity boost but the things you’re talking about there these are pretty dramatic like the idea that just solving um this distributed transaction problem that’s going to enable like Banks to cut out half of their code base that’s going to be a huge like productivity Boost from the maintenance and then the manufacturing example of just saying okay we got better security this is going to protect us from runs Weare this is like pretty in stuff the think the thing I found really really exciting is that most everyone we talk to about the idea thinks it’s conceptually fantastic uh and so the you know getting getting to an implementation is a small matter of small matter of Legacy code uh so anyway I’m really excited about about the possibilities this does sound pry cool um so it sounds like um the big use case of this is going to be around um infrastructure for cloud applications so VI your software as to service stuff and so can you talk me through in general what does the infrastructure of the cloud applications currently look like well well to start with if you move to the cloud then you were as I said earlier you’re highly encouraged to take a software as a service model and you get disaggregated storage stuff like S3 uh and so uh our point of view is that if if the world is moving from on Prem to the cloud we should go to where the Market’s going to be not where it was and so right now Debo is a cloud only Service uh it runs software as a service uh so that you only uh are using resources when you are actually running and if you’re idle you’re not using any and we we and the other thing to note is that transactional databases have gotten wildly faster in the last decade or two so it’s fast enough to put a database system at the bottom uh and so that’s what we do so we run on the bare metal well we run we run on a microcurrent uh and at some point we may well write our own micr kernel so that we’re really running on the bare metal uh and so Linux is nowhere in sight uh right now we’re running on AWS were’re running on one of their micr kernels called firecracker uh and we the database system is the only thing running on top of firecracker and uh on top of the database system we write uh you know we’ve written a file system a messaging system a bunch of schedulers and your application runs on top of that stuff so just so you everybody gets really clear what we have in mind let me tell you how the messaging system works it is not tcpip at all uh it is not a heavyweight thing and it’s all written in SQL and what do I mean by that well to send there’s a message table with a sender a receiver and a payload and send a message you do an insert into that table that’s one line seel uh we’re running on top of a partitioned multi- Noe highly available DPMS so that Tuple ends up at the site home site of the receiver and so to uh read a message you just do a SQL query uh another oneline SQL so that’s the message system uh once you go outside of our environment we have a Gateway that goes out onto tcpip and the rest of the world but inside us it’s all just the database so everything is just the database and so the datab system is the only thing running besides a small microc kernel uh and as software is a service the way software as a service works is you have to structure your application as a graph of workflow steps uh you can call them Micro Ops operations they’re just pieces of code so right now uh we’ve decided to support uh we we used to run on JavaScript and we now and Java and now we’ve moved to typescript because it seems to be more popular so you write a collection of operations in typescript and you tell us the graph of those operations and so that’s the way you have to structure your application to get uh software service to work we accept uh that graph and those operations we store them in the database and we have a tiny orchestrator that that wakes up uh any given operation when its inputs are available and it produces an output all of that’s in the database so you just have to write a graph of typescript and we take care of everything else so we we run it for you uh if it seems to be slowly we give you more resources and so forth okay um so I have to say I love the idea that um the messaging system is just a table in a database so you can write seful queries against it I’d love to get into that a little bit more later but just for now what are the implications for anyone who wants to develop applications on top of deboss all you have to do well you have to run on we run on the cloud uh at some point we will probably support an on-prem deployment but on-prem comes with just the ton of idiosyncratic Behavior you know on the part of whatever uh your shop is doing but we run we run on AWS we will run in the near future on Azure and on gcp and so all you have to do is produce you know this graph of typescript uh you have to be using typescript and I I expect in very short order we will support 10 languages because uh typescript is certainly not it’s popular but it’s certainly not Universal so if uh we’ll probably support uh Java we’ll probably support JavaScript we’ll probably support python uh will support go if if there’s I mean we’ll support languages as there’s interest in US supporting them uh and uh in short order we will run on the popular clouds in a variety of programming languages uh and what you get is that every one of your operations is a transaction and so you you know one of the big problems people have with software service applications is that they’re broken up into a whole bunch of steps that are running in parallel uh and so if there’s race conditions between the parallel operations those are fishlyn race conditions uh and so so if you know the word heisen bug uh it was term coined by Jim Gray the late Jim Gray we avoid almost all heisen bugs uh we also give you a debugger because if we can back up the operating system we can also back up your application so if you’re in debug mode you simply and something bad happens we simp you can simply back up 3 minutes uh single step forward word uh change the code change the data so we give you a really Nifty debugging environment but we give you transactions for everything uh and so if you want to ask questions about your application the state of your application is in the same uh data warehouse as the operating system stuff so you can just uh use SQL to ask questions about what happening uh you can use SQL to ask uh you know for example uh if you think that I’m possibly a bad actor you want to know who I’ve sent messages to who they sent messages to transitive closure of who I’ve ever talked to and that’s just sequel uh so right now you know asking questions about what’s going on is really really difficult and it’s very easy to do monitoring so uh we talked we right now you have to sort of move what amounts to the event log into some proprietary system uh that talks you know a proprietary language in our world it all goes into a data warehouse you can just query it in SQL so you get much easier monitoring you get uh a fancy debugger you get super security uh you get you know multi-node support you don’t have to run kubernetes uh and you’re not running Linux and so you get a much simpler environment to maintain a lot less moving Parts a lot more security uh and you get you know a a Next Generation programming environment so it’s very attractive to application developers most we’ve talked to a whole bunch of them and most of them most of them think wow this is really neat uh and then they say yeah but you don’t support go or or you know pick pick pick pick an objection and so uh the real the real question is uh you know we could support all of posix which is sort of yesterday’s standard uh I’m really reluctant to do that because most people on the cloud don’t give don’t care about posix at all they’re focused on you know workflow standards stuff like that so we’d like to be uh supportive of NE of the standards coming rather than yesterday’s news so but the the question is how how big an application surface do we have to support in order to get traction and we’ll find out or we are finding out uh as we speak okay uh I think your point about heisenbergs is really interesting because certainly as a user of web applications I’ve often had the experience where something’s gone wrong report the bug and then the response back is well I can’t reproduce this is maybe a temporary glitch and then from the engineer side they’re like well you know how do I fix this if uh if I can’t reproduce this so the idea that um those categories of temporary problems are going to largely go away because you’ve got that state that to seem like a huge step forward well well data database systems have this problem in Spades I mean I was I worked for a bunch of different database companies uh and so if if you have a heisen bug you know that then the use user sends you the bug and you can’t reproduce it and so if the customer is important enough you put Engineers on airplanes uh and go to the customer site and put a print statement everywhere in sight uh and so they’re fishlyn fishlyn so this is serious business uh and so we we make we make things a lot better going back to um what you said about how because everything’s sort of a squel internally you can start doing queries on it um what sort of queries might I want to run against an operating system who who who in my environment is using more than 100 gigabytes of space counting only those files bigger than you know one gigabyte uh and you can’t ask that now and it’s just SQL in our system uh or you know uh which which three users are chewing up the most resources uh and you know is there is there anybody you know who has copied more than 20 files in the last uh 12 hours uh you know just etc etc etc from the application developer point of view um can you also do debugging by writing SQL queries uh well we you know yeah of course but but we also give you this this time travel debugger but sure you can you can write right SQL against against the what amounts to the event log the data you know in the data warehouse and so yeah that that works great okay um so it it sounds like uh if you got a few SQL skills then it’s going to be make uh a lot of what’s happening within the operating system much more accessible um I’d also like to talk a bit about cyber security so you mentioned this before that security is one of the main features and that you can prevent these ransomware attacks um are there any other uh Security benefits from running deboss as opposed to another operating system well there’s a much there’s a much smaller attack surface uh so you know compared to traditional worlds so so there’s a lot less gates to close uh and you know you know the current world is you start with Len a leaky boat and you paper over that with a bunch of stuff and in our opinion that gets you another leaky boat and so the the way the way to make security better is to have a much simpler system with a lot less attack surface and that’s exactly what we do uh and so there’s much less ATT Tax Service you can do much much better monitoring and you can get up from ransomware attacks you know easily so that’s that’s our main security story okay just having a simpler system means less things can go wrong uh that seems useful to know I’d also like to ask you a bit about like what it’s like to create a startup because often creating a startup is very much seen as a a Young Person’s game and well you’ve been around for a while now so can just tell me like what inspired you to create another startup so my my career has been in Academia and most people in Academia uh want to get famous and write papers uh and whether or not they do anything meaningful to the real world is is irrelevant uh and I somehow early on I decided that it’s important to try and make change and I learned a long while ago that the very big companies you know don’t don’t invest in new stuff uh they by and large buy buy startups after they’re somewhat successful so the way to make make change happen is to do startups and so whenever I’ve had an idea that looked like it was commercializable to make a difference you do a startup and after a while they get easier and easier for someone like me to do and nobody has yet complained about my age uh which everybody should complain about since I’m old but uh I’m I’m going to keep doing this as long as I can make a difference that’s wonderful it’s very inspiring that you know keep pluging away this and keep coming up with ideas and and creating stuff um all right um can you tell me a bit about what you’re working on with uh debos right now what’s coming soon okay so in the commercial we’ve talked about sort of what the commercial guys are thinking about which is getting the getting the getting the first Lighthouse customers making them happy uh and you know doing whatever they need to be successful uccessful uh so that’s exactly what you would expect uh the academic academic research on deboss goes on so that hasn’t stopped and so the thing that I’m most interested in is that if you look at data database transactions uh the pole in the tent as to what consumes what consumes CPU time it looks like it’s pretty much moved to being the networking system and so to go f to go faster you’ve got to redo the networking system uh and so we’re looking at all kinds of ways to go fast to to send messages faster than inserting them into a database table and then reading them back out again so we’re working on on high poles in the current in the current transactional database stack uh so that that’s uh that’s something that I’m very very interested in and then another thing is everybody on the planet is dabbling in large language models as am I uh and the question of the day is uh what what can large language models do for you know structured data in database systems so I’m plugging away at that so that that’s what I’m focused on on you know in an academic context okay um faster networking and uh large language models it’s exciting stuff um all right so do you have any final advice for people who are interested in using deboss sure you know get at it uh it it works it works and works well uh and it has a ton of advantages so uh go go to our website and kick the tires uh and you can download you can sign up you know to freely use the you know the cloud version software as a service version so we’re we’re excited to try and get feedback uh and it costs nothing except small amounts of your time so have at it and tell us what you think all right super uh everyone gets you got some got a call to action there audience all right thank you very much for your time Mike that was great okay thank you Richie oh [Music]