1 00:00:02,500 --> 00:00:04,580 SIREN WAILS 2 00:00:08,460 --> 00:00:13,300 In Los Angeles, a remarkable experiment is under way. 3 00:00:13,300 --> 00:00:16,540 Face the wall, face the wall before I put you in handcuffs. 4 00:00:16,540 --> 00:00:20,500 The police are trying to predict crime before it even happens. 5 00:00:20,500 --> 00:00:22,580 It actually gives us a forecast 6 00:00:22,580 --> 00:00:26,060 about where crime is most likely to happen in the next 12 hours. 7 00:00:28,860 --> 00:00:32,340 In the City of London, this scientist-turned-trader 8 00:00:32,340 --> 00:00:36,660 believes he's found the secret of making millions, with maths. 9 00:00:36,660 --> 00:00:42,740 The potential to do things with data is fantastic, fantastic. 10 00:00:42,740 --> 00:00:47,420 And in South Africa, this star-gazer has set out to catalogue 11 00:00:47,420 --> 00:00:52,740 the entire cosmos, by listening to every single star. 12 00:00:55,140 --> 00:00:59,860 What unites these different worlds is an explosion in data. 13 00:01:03,860 --> 00:01:06,300 The volume of it, the dynamic nature of the data 14 00:01:06,300 --> 00:01:08,300 is changing how we live our lives. 15 00:01:10,340 --> 00:01:11,820 In just the last few years, 16 00:01:11,820 --> 00:01:15,300 we've produced more data than in all of human history. 17 00:01:17,500 --> 00:01:21,660 In this film, we follow the people who are mining this data. 18 00:01:21,660 --> 00:01:24,820 It's set to become one of the greatest sources of power 19 00:01:24,820 --> 00:01:26,180 in the 21st century. 20 00:01:46,580 --> 00:01:50,100 6am, Los Angeles. 21 00:01:50,100 --> 00:01:53,100 The start of shift in the Foothill division. 22 00:01:55,460 --> 00:01:59,300 Officer Steve Nunes, a 12-year-veteran of the LAPD, 23 00:01:59,300 --> 00:02:03,100 and his partner Danny Fraser head out to patrol. 24 00:02:06,460 --> 00:02:08,460 Right now, we're north of Los Angeles, 25 00:02:08,460 --> 00:02:11,460 downtown Los Angeles, in the San Fernando Valley area. 26 00:02:13,740 --> 00:02:16,780 Their beat is one of LA's toughest neighbourhoods. 27 00:02:18,820 --> 00:02:22,660 There's a lot of BFMVs, burglary from motor vehicles. 28 00:02:22,660 --> 00:02:24,300 There's a lot of robberies, 29 00:02:24,300 --> 00:02:28,100 there's a lot of gang and narcotic activity over here. 30 00:02:28,100 --> 00:02:31,420 There's a lot of people selling drugs. 31 00:02:31,420 --> 00:02:34,060 The gang that's in this area are called the Project Boys. 32 00:02:34,060 --> 00:02:35,420 They're a Hispanic gang. 33 00:02:37,700 --> 00:02:39,700 Despite their experience 34 00:02:39,700 --> 00:02:41,820 and intimate knowledge of the neighbourhood, 35 00:02:41,820 --> 00:02:46,260 today, their patrol is being controlled by a computer algorithm. 36 00:02:57,580 --> 00:02:59,940 You know, I wasn't really too happy about it, 37 00:02:59,940 --> 00:03:02,740 you know, specially as a police officer 38 00:03:02,740 --> 00:03:06,020 you know, we kind of go off of what we know from our training. 39 00:03:06,020 --> 00:03:08,340 We weren't too happy about a computer telling us 40 00:03:08,340 --> 00:03:10,020 where we need to do our police work 41 00:03:10,020 --> 00:03:11,860 and what area we need to drive around. 42 00:03:11,860 --> 00:03:14,940 Steve and Danny are part of a ground-breaking trial. 43 00:03:15,980 --> 00:03:17,740 An equation is being used 44 00:03:17,740 --> 00:03:20,380 to predict where crime will occur on their watch. 45 00:03:23,220 --> 00:03:26,460 I saw some people hanging out by the laundry, like the little laundry. 46 00:03:27,580 --> 00:03:28,540 I guess... 47 00:03:30,540 --> 00:03:32,820 If its predictions are correct, 48 00:03:32,820 --> 00:03:35,300 the system will be rolled out across all LA... 49 00:03:35,300 --> 00:03:38,420 Hey, stop, yeah, stop. Stop, stop, stop. 50 00:03:38,420 --> 00:03:40,860 Put your hands on your head. 51 00:03:40,860 --> 00:03:42,740 ..and the computer algorithm 52 00:03:42,740 --> 00:03:45,740 will become a routine part of Steve's working life. 53 00:03:49,500 --> 00:03:52,460 Spread your feet, face forward. Have anything on you? 54 00:03:52,460 --> 00:03:56,100 Stop moving. Face the wall, face the wall before I put you in handcuffs. 55 00:03:57,500 --> 00:03:58,980 You a Project Boy too or no? 56 00:04:00,940 --> 00:04:02,820 The ambition to predict crime 57 00:04:02,820 --> 00:04:06,220 was born out of a remarkable collaboration 58 00:04:06,220 --> 00:04:08,300 between the LAPD... 59 00:04:11,100 --> 00:04:13,140 ..and the University of California. 60 00:04:18,820 --> 00:04:22,940 Jeff Brantingham might seem an unlikely crime fighter. 61 00:04:22,940 --> 00:04:24,460 A professor of anthropology, 62 00:04:24,460 --> 00:04:28,980 he is an expert on remote hunter-gatherer tribes in China, 63 00:04:28,980 --> 00:04:34,340 but he's convinced that from remote China to gangland LA, 64 00:04:34,340 --> 00:04:37,300 all human behaviour is far more predictable 65 00:04:37,300 --> 00:04:38,940 than you might like to believe. 66 00:04:40,340 --> 00:04:44,220 We all like to think that we are in control of everything, 67 00:04:44,220 --> 00:04:48,740 but in fact all of our behaviour is very regular, 68 00:04:48,740 --> 00:04:52,780 very patterned in ways that is often frightening to us. 69 00:04:52,780 --> 00:04:54,420 Offenders are no different. 70 00:04:54,420 --> 00:04:57,460 They do exactly the same things over and over and over again, 71 00:04:57,460 --> 00:04:59,500 and their criminal offending patterns 72 00:04:59,500 --> 00:05:02,940 emerge right out of that regularity of their behaviour. 73 00:05:06,460 --> 00:05:08,020 Jeff believed he could find 74 00:05:08,020 --> 00:05:10,460 repeating patterns of criminal behaviour 75 00:05:10,460 --> 00:05:13,500 in the LAPD's vast dataset - 76 00:05:13,500 --> 00:05:17,860 13 million crimes recorded over 80 years. 77 00:05:17,860 --> 00:05:21,420 The LAPD have droves and droves of data 78 00:05:21,420 --> 00:05:23,980 about where and when crimes have been occurring. 79 00:05:23,980 --> 00:05:27,220 It represents a treasure trove of potential information 80 00:05:27,220 --> 00:05:29,340 for understanding the nature of crime. 81 00:05:31,180 --> 00:05:37,100 The LAPD already use their crime data to identify hotspots of crime, 82 00:05:37,100 --> 00:05:40,300 but that only tells them where crime has already struck. 83 00:05:41,340 --> 00:05:44,140 We've gotten very good at looking at dots on a map, 84 00:05:44,140 --> 00:05:47,100 and where, where crime has occurred 85 00:05:47,100 --> 00:05:49,060 and the problem with that is 86 00:05:49,060 --> 00:05:51,580 that, sometimes, you're making an assumption 87 00:05:51,580 --> 00:05:53,980 that today is the same as yesterday. 88 00:05:58,340 --> 00:05:59,820 Jeff Brantingham planned to do 89 00:05:59,820 --> 00:06:03,860 something more radical and more useful - predict the future. 90 00:06:05,340 --> 00:06:08,020 He believed he could use patterns in the crime data 91 00:06:08,020 --> 00:06:11,180 to predict where and when crime was likely to occur. 92 00:06:18,700 --> 00:06:22,140 We've long used the patterns in nature to make predictions. 93 00:06:24,340 --> 00:06:27,660 From the setting sun, we learned when to expect the new day. 94 00:06:33,140 --> 00:06:35,380 The phases of the moon allowed us to forecast 95 00:06:35,380 --> 00:06:37,180 the ebb and flow of the tides. 96 00:06:41,540 --> 00:06:44,020 And from observing the patterns of the stars, 97 00:06:44,020 --> 00:06:46,260 we mastered the art of navigation. 98 00:06:50,180 --> 00:06:54,620 But Jeff Brantingham wanted to do something far more ambitious. 99 00:06:54,620 --> 00:06:56,580 He wanted to tease out patterns 100 00:06:56,580 --> 00:07:00,300 in the apparent chaos of human behaviour, 101 00:07:00,300 --> 00:07:06,980 to uncover them in the LAPD's vast dataset of 13 million past crimes. 102 00:07:06,980 --> 00:07:09,900 You can have gut feelings about the crime but, ultimately, 103 00:07:09,900 --> 00:07:13,580 you need to think about working in a mathematical framework 104 00:07:13,580 --> 00:07:16,820 because mathematics gives you the ability to understand 105 00:07:16,820 --> 00:07:19,540 exactly why things are happening within the data 106 00:07:19,540 --> 00:07:21,340 in a way that gut feelings do not. 107 00:07:24,380 --> 00:07:26,940 Jeff needed an expert in pattern detection. 108 00:07:29,220 --> 00:07:33,820 He turned to his colleague, UCLA mathematician George Mohler. 109 00:07:33,820 --> 00:07:35,500 As mathematicians, 110 00:07:35,500 --> 00:07:41,020 we're interested in understanding what's around you so, you know, 111 00:07:41,020 --> 00:07:44,620 how do waves propagate if you throw a pebble into the water? 112 00:07:45,780 --> 00:07:47,900 The distribution of trees in a forest. 113 00:07:47,900 --> 00:07:52,100 So mathematical models can help you understand those types of things. 114 00:07:58,180 --> 00:08:00,300 George could use mathematical tools 115 00:08:00,300 --> 00:08:02,700 to see what was hidden in the crime data. 116 00:08:03,940 --> 00:08:06,020 And there were hints of a pattern in it. 117 00:08:08,020 --> 00:08:11,260 What you see is that after a crime occurs, there's an elevated risk 118 00:08:11,260 --> 00:08:13,700 and that risk travels to neighbouring regions. 119 00:08:13,700 --> 00:08:16,980 So what we wanted to do is develop a model to take that into account 120 00:08:16,980 --> 00:08:19,220 so police could maybe use that information 121 00:08:19,220 --> 00:08:21,260 to prevent those crimes from occurring. 122 00:08:25,020 --> 00:08:29,140 He started with a mathematical model that was already being used, 123 00:08:29,140 --> 00:08:32,140 right here on the west coast of America. 124 00:08:41,140 --> 00:08:43,540 Southern California is earthquake country. 125 00:08:45,860 --> 00:08:47,820 Sitting on the San Andreas Fault, 126 00:08:47,820 --> 00:08:50,220 there's an average of 10,000 earthquakes 127 00:08:50,220 --> 00:08:51,860 and after-shocks every year. 128 00:08:53,460 --> 00:08:58,700 The biggest for 100 years was the Loma Prieta earthquake of 1989. 129 00:09:00,340 --> 00:09:04,620 Its epicentre was here, just outside Santa Cruz, California. 130 00:09:08,340 --> 00:09:10,380 There is quite simply no mathematical model 131 00:09:10,380 --> 00:09:12,340 that can predict an earthquake like this one. 132 00:09:15,220 --> 00:09:17,460 But after the earthquake come the after-shocks 133 00:09:17,460 --> 00:09:19,140 and that's a different matter. 134 00:09:22,260 --> 00:09:25,180 So we're several hundred metres from the epicentre. 135 00:09:25,180 --> 00:09:28,020 Nearby was one of the after-shocks 136 00:09:28,020 --> 00:09:30,340 of the original Loma Prieta earthquake. 137 00:09:32,340 --> 00:09:35,660 After a large earthquake occurs, there is a probability 138 00:09:35,660 --> 00:09:39,420 that another earthquake will follow nearby in space and time. 139 00:09:40,740 --> 00:09:43,660 George discovered seismologists had found a pattern 140 00:09:43,660 --> 00:09:45,700 to earthquake after-shocks 141 00:09:45,700 --> 00:09:49,340 and developed an algorithm to predict these after-shock clusters. 142 00:09:51,580 --> 00:09:55,340 These types of clustering patterns are also seen in crime data. 143 00:09:55,340 --> 00:10:00,340 So, after a crime occurs, you will see an increased likelihood 144 00:10:00,340 --> 00:10:03,780 of future events nearby in space and time. 145 00:10:03,780 --> 00:10:06,380 You can think of them as after-shocks of crime. 146 00:10:09,820 --> 00:10:11,700 George and Jeff took the equation 147 00:10:11,700 --> 00:10:13,940 for predicting earthquake after-shocks 148 00:10:13,940 --> 00:10:16,340 and began to adapt it to predict crime. 149 00:10:18,420 --> 00:10:22,260 So the model is broken into several parts, 150 00:10:22,260 --> 00:10:25,900 so the overall rate of crime, which we'll call Lamda, 151 00:10:25,900 --> 00:10:29,540 models the rate of events in space and time. 152 00:10:29,540 --> 00:10:31,060 We use the Greek letter Myu 153 00:10:31,060 --> 00:10:35,780 to represent the background amount of crime that's going on. 154 00:10:35,780 --> 00:10:38,260 The second component to Lamda is G. 155 00:10:38,260 --> 00:10:42,780 G models the distribution of crimes following an initial event. 156 00:10:42,780 --> 00:10:47,300 This whole term overall describes what we call self-excitation, 157 00:10:47,300 --> 00:10:49,820 that a crime that occurs today 158 00:10:49,820 --> 00:10:53,740 actually self-excites the possibility of future crimes. 159 00:10:53,740 --> 00:10:57,180 So Lamda equals Myu plus G, is that right? 160 00:10:57,180 --> 00:11:00,140 Well, sort of, so Lamda equals Myu 161 00:11:00,140 --> 00:11:04,980 plus G positioned at all the past events in your dataset. 162 00:11:08,380 --> 00:11:11,900 George and Jeff took their algorithm back to the streets of LA. 163 00:11:18,540 --> 00:11:21,980 When they plugged the old crime data into the equation, 164 00:11:21,980 --> 00:11:25,980 it generated predictions that fitted what had happened in the past. 165 00:11:30,300 --> 00:11:33,740 But could it also predict the future? 166 00:11:37,420 --> 00:11:40,860 They began to produce daily crime forecasts, 167 00:11:40,860 --> 00:11:46,140 identifying hotspots where crime was likely to strike in the future. 168 00:11:47,980 --> 00:11:54,900 11 Nunes, there. Sir. 23 Fowler. Wallier. Sir. 169 00:11:54,900 --> 00:11:57,620 Let's go to the mission maps if you would, please. 170 00:11:58,860 --> 00:12:02,780 Today, the LAPD is putting these predictions to the test. 171 00:12:03,820 --> 00:12:08,300 The cops in Foothill are assigned boxes of just 500 square feet 172 00:12:08,300 --> 00:12:11,300 where the algorithm predicts crime is most likely to occur 173 00:12:11,300 --> 00:12:13,780 in their 12-hour watch. 174 00:12:13,780 --> 00:12:16,020 Right, predictive mission for today is, 175 00:12:16,020 --> 00:12:18,060 we've got a few boxes here to address, 176 00:12:18,060 --> 00:12:23,700 in Adam 11's area, 12260 Foothill Boulevard. 177 00:12:23,700 --> 00:12:27,060 They're instructed to hit their boxes as often as they can. 178 00:12:27,060 --> 00:12:29,220 Osborne and Foothill Boulevard. 179 00:12:29,220 --> 00:12:31,260 So you've got your mission for the day? 180 00:12:31,260 --> 00:12:33,540 So let's go out there, have fun and be safe. 181 00:12:50,660 --> 00:12:53,140 Yeah, there is a homicide blinking up there. 182 00:12:54,820 --> 00:12:59,260 The trial is monitored at the real time crime centre in downtown LA. 183 00:13:08,780 --> 00:13:10,660 What we're looking at here 184 00:13:10,660 --> 00:13:14,060 is the forecast that was produced by the PredPol software. 185 00:13:14,060 --> 00:13:16,140 So if you see on the centre of this map, 186 00:13:16,140 --> 00:13:22,420 we've got three nearly contiguous forecast boxes around this area, 187 00:13:22,420 --> 00:13:24,580 and then an adjacent one. 188 00:13:24,580 --> 00:13:27,300 So this is good information for the officers. 189 00:13:27,300 --> 00:13:30,580 They can go out there, work up and down that street, Sheldon, 190 00:13:30,580 --> 00:13:32,860 and some of those side streets, 191 00:13:32,860 --> 00:13:34,820 and look for criminal activity 192 00:13:34,820 --> 00:13:37,660 or evidence that criminal activity might be afoot. 193 00:13:39,900 --> 00:13:41,340 OK, Roger, we'll take it. 194 00:13:41,340 --> 00:13:43,460 SIREN WAILS 195 00:13:49,460 --> 00:13:52,100 Steve and Danny have got the word to go. 196 00:13:52,100 --> 00:13:55,220 The model has predicted car crime in a box on their beat. 197 00:13:58,140 --> 00:13:59,540 It's a kid. 198 00:14:04,500 --> 00:14:08,940 Yeah, it's the same address as that kid that we had yesterday. 199 00:14:08,940 --> 00:14:11,060 When they reach their assigned hotspot, 200 00:14:11,060 --> 00:14:13,340 they find a cold-plated car. 201 00:14:13,340 --> 00:14:16,020 The licence plates don't match the vehicle. 202 00:14:16,020 --> 00:14:18,860 They're getting what they need, huh? 203 00:14:18,860 --> 00:14:22,540 When they call the number in, it turns out the car's been stolen. 204 00:14:23,780 --> 00:14:26,220 It was an area where there's a lot of GTAs, 205 00:14:26,220 --> 00:14:29,420 which is "grand theft auto", people were stealing cars. 206 00:14:29,420 --> 00:14:30,700 Right out of roll call, 207 00:14:30,700 --> 00:14:33,820 right when we got down one of the boxes they went into, 208 00:14:33,820 --> 00:14:37,420 one of the areas they started patrolling, 209 00:14:37,420 --> 00:14:40,900 right away they ran a car and it came back stolen. 210 00:14:42,820 --> 00:14:45,540 In Foothill, they found using the algorithm 211 00:14:45,540 --> 00:14:49,020 led to a 12% decrease in property crime 212 00:14:49,020 --> 00:14:51,180 and a 26% decrease in burglary. 213 00:14:51,180 --> 00:14:54,260 At first I said we weren't big on it, you know, 214 00:14:54,260 --> 00:14:56,580 and it came to the point where, little by little, 215 00:14:56,580 --> 00:14:59,820 you start to see crime in certain areas deteriorate 216 00:14:59,820 --> 00:15:02,780 because of us being in that box for, you know, 217 00:15:02,780 --> 00:15:06,300 even ten minutes, twenty minutes, even five minutes. 218 00:15:06,300 --> 00:15:08,900 So, we definitely see how it is working. 219 00:15:11,860 --> 00:15:15,580 The model is continuously updated with new crime data, 220 00:15:15,580 --> 00:15:20,260 helping to make the predictions ever more accurate. 221 00:15:20,260 --> 00:15:23,100 This whole year since January, Foothill area has been 222 00:15:23,100 --> 00:15:26,580 leading the city of Los Angeles in crime reduction, week to week, 223 00:15:26,580 --> 00:15:28,940 so the officers, once it started working, 224 00:15:28,940 --> 00:15:30,180 then we had buy-in from them 225 00:15:30,180 --> 00:15:33,140 and now it's just a regular course of how they do business. 226 00:15:35,340 --> 00:15:37,700 Predictive policing will be rolled out 227 00:15:37,700 --> 00:15:41,380 right across the city of Los Angeles, 228 00:15:41,380 --> 00:15:45,780 and is being trialled in over 150 cities across America. 229 00:15:50,700 --> 00:15:53,940 And predicting crime from crime data is just one way 230 00:15:53,940 --> 00:15:56,060 the data miners are changing our world. 231 00:16:02,860 --> 00:16:06,300 In fact, the tools that Jeff used to mine the LAPD data 232 00:16:06,300 --> 00:16:08,020 can be applied to any dataset. 233 00:16:10,140 --> 00:16:12,620 The vast complexity of the universe... 234 00:16:15,900 --> 00:16:18,020 ..the diversity of human behaviour... 235 00:16:20,140 --> 00:16:23,580 ..even the data we create ourselves every day. 236 00:16:25,140 --> 00:16:29,500 The data miners are reaching into every area of our lives, 237 00:16:29,500 --> 00:16:35,260 from medicine to advertising, to the world of high finance. 238 00:16:37,860 --> 00:16:41,300 Professor Phil Beales is a geneticist 239 00:16:41,300 --> 00:16:44,660 at the forefront of this data revolution. 240 00:16:44,660 --> 00:16:46,700 The methods he uses today 241 00:16:46,700 --> 00:16:49,580 can be traced back to an extraordinary man 242 00:16:49,580 --> 00:16:52,860 living in London 300 years ago. 243 00:16:52,860 --> 00:16:57,460 The first data miner, the amateur scientist, John Graunt. 244 00:16:59,940 --> 00:17:02,020 Graunt was living through 245 00:17:02,020 --> 00:17:05,660 the greatest health threat of his day, the bubonic plague. 246 00:17:05,660 --> 00:17:07,780 Its causes were an utter mystery. 247 00:17:09,500 --> 00:17:13,700 Graunt began searching for patterns in the parish death records, 248 00:17:13,700 --> 00:17:15,540 known as the Bills of Mortality. 249 00:17:17,820 --> 00:17:19,500 The Bills of Mortality 250 00:17:19,500 --> 00:17:24,340 were essentially random sets of information 251 00:17:24,340 --> 00:17:27,060 which he brought together and organised 252 00:17:27,060 --> 00:17:29,580 and made sense of that information, 253 00:17:29,580 --> 00:17:32,340 so Graunt realised that this information 254 00:17:32,340 --> 00:17:34,060 was essentially a gold mine. 255 00:17:39,580 --> 00:17:42,420 Graunt wanted to know who had died of the plague 256 00:17:42,420 --> 00:17:46,420 and who had died of something else. 257 00:17:46,420 --> 00:17:48,900 He compiled all the death records together. 258 00:17:51,460 --> 00:17:57,660 And this dataset allowed him to see patterns that no-one else had seen. 259 00:17:57,660 --> 00:18:00,460 He listed a number of the causes of death 260 00:18:00,460 --> 00:18:02,540 and categorised them in such a way 261 00:18:02,540 --> 00:18:08,820 that one can now look back and see exactly what people died of. 262 00:18:08,820 --> 00:18:13,660 For example 38 people had King's Evil, 263 00:18:13,660 --> 00:18:15,500 which is actually tuberculosis of the neck 264 00:18:15,500 --> 00:18:17,740 or otherwise called scrofula. 265 00:18:17,740 --> 00:18:21,020 One patient was bit with a mad dog, 266 00:18:21,020 --> 00:18:23,820 another 12 had French Pox, which is actually syphilis. 267 00:18:26,700 --> 00:18:32,180 And in the plague deaths, Graunt found a revealing pattern. 268 00:18:32,180 --> 00:18:35,660 It overturned an idea that everyone shared at the time 269 00:18:35,660 --> 00:18:37,460 about what caused the disease. 270 00:18:39,820 --> 00:18:45,260 He was able to refute the widely-held belief 271 00:18:45,260 --> 00:18:49,100 that plague might have been caused by person-to-person contact, 272 00:18:49,100 --> 00:18:52,500 and he was also able to refute the widely-held belief 273 00:18:52,500 --> 00:18:56,380 at that time that plague tended to increase 274 00:18:56,380 --> 00:18:59,020 during the first year of the reign of a new king. 275 00:19:03,020 --> 00:19:06,340 And the more Graunt looked at the data, 276 00:19:06,340 --> 00:19:08,740 the more hidden patterns he discovered. 277 00:19:13,220 --> 00:19:17,100 People started to see the city of London in an entirely new way. 278 00:19:19,380 --> 00:19:22,980 He was the first to estimate its population. 279 00:19:22,980 --> 00:19:27,620 He proved more boys were born than girls, 280 00:19:27,620 --> 00:19:29,980 but that higher male mortality 281 00:19:29,980 --> 00:19:33,740 meant the population was soon evenly balanced. 282 00:19:33,740 --> 00:19:37,180 He showed that surprising and rather useful ideas 283 00:19:37,180 --> 00:19:42,900 could be mined from data, if you knew how to examine it. 284 00:19:42,900 --> 00:19:47,140 This was a completely new way of looking at the information 285 00:19:47,140 --> 00:19:51,580 and from extracting really useful data, 286 00:19:51,580 --> 00:19:53,900 so Graunt was essentially a pioneer. 287 00:19:53,900 --> 00:19:59,660 Graunt was the founding father of statistics and epidemiology, 288 00:19:59,660 --> 00:20:05,420 the study of the patterns, causes and effects of disease. 289 00:20:05,420 --> 00:20:08,300 And it's this same power of data 290 00:20:08,300 --> 00:20:12,700 that has become fantastically valuable in modern medicine. 291 00:20:12,700 --> 00:20:17,100 Today, Professor Phil Beales is mining a new human dataset, 292 00:20:17,100 --> 00:20:20,500 the three billion bits of genetic information 293 00:20:20,500 --> 00:20:24,380 that make up the human genome. 294 00:20:24,380 --> 00:20:29,340 He's searching our DNA for clues to help him diagnose and treat illness. 295 00:20:35,060 --> 00:20:37,900 Let me just take a quick look at you. 296 00:20:37,900 --> 00:20:40,340 Jake Pickett is one of his patients. 297 00:20:44,900 --> 00:20:48,500 When Jake was born, there were no extra skin tags 298 00:20:48,500 --> 00:20:50,900 or extra toes or fingers or anything like that? 299 00:20:50,900 --> 00:20:52,980 I had a skin tag on my arm. 300 00:20:52,980 --> 00:20:57,580 For 14 years, Jake has lived with an unusual range of symptoms, 301 00:20:57,580 --> 00:21:01,820 including learning difficulties, obesity, and poor eyesight. 302 00:21:01,820 --> 00:21:03,740 You had an earring in there? 303 00:21:03,740 --> 00:21:06,140 Yeah. Oh, OK, you weren't born with that! 304 00:21:06,140 --> 00:21:12,020 His unidentified condition has baffled his parents and doctors. 305 00:21:12,020 --> 00:21:14,740 We've had a lot of tests over the years, and actually, 306 00:21:14,740 --> 00:21:16,860 my paediatrician of the time had said to me, 307 00:21:16,860 --> 00:21:18,540 "He's such a happy, lovely young boy. 308 00:21:18,540 --> 00:21:20,780 "Why do you want to keep sticking him with needles?" 309 00:21:20,780 --> 00:21:23,100 and it made me a bit frightened to keep asking for help, 310 00:21:23,100 --> 00:21:25,340 because then I thought maybe the medics would think 311 00:21:25,340 --> 00:21:26,980 there's something wrong with me. 312 00:21:26,980 --> 00:21:30,340 But in the course of Jake's lifetime, medicine has changed. 313 00:21:33,140 --> 00:21:35,500 Professor Beales now has the tools 314 00:21:35,500 --> 00:21:40,100 that may help Jake and his family unravel this mystery. 315 00:21:40,100 --> 00:21:42,300 ..because they know it's difficult for him. 316 00:21:42,300 --> 00:21:46,660 As part of the blood test today, we will take some of that 317 00:21:46,660 --> 00:21:51,020 and from that blood take the DNA, extract the DNA, 318 00:21:51,020 --> 00:21:54,700 and then we will do the genetic testing on those. 319 00:21:54,700 --> 00:21:56,780 Are you happy with that? Yeah, yeah. 320 00:21:56,780 --> 00:21:58,500 It will take a few weeks. 321 00:21:58,500 --> 00:22:02,620 So the key really is to try to nail down the diagnosis 322 00:22:02,620 --> 00:22:06,260 in this particular situation, if we can. 323 00:22:06,260 --> 00:22:07,860 OK, that's great. 324 00:22:07,860 --> 00:22:10,700 This is just to clean it. 325 00:22:10,700 --> 00:22:14,020 He will search Jake's DNA, 326 00:22:14,020 --> 00:22:16,660 hunting for the tiny telltale variations in his genes 327 00:22:16,660 --> 00:22:18,380 that may have caused his condition. 328 00:22:23,540 --> 00:22:25,940 Just hold still for me. 329 00:22:25,940 --> 00:22:28,260 Every patient whose genes are analysed 330 00:22:28,260 --> 00:22:30,300 adds to the growing database of DNA. 331 00:22:31,460 --> 00:22:34,500 It helps doctors devise new treatments 332 00:22:34,500 --> 00:22:38,100 and identify previously mysterious conditions. 333 00:22:38,100 --> 00:22:40,700 Well done, it's all done. OK? 334 00:22:40,700 --> 00:22:43,020 Phew! OK? 335 00:22:43,020 --> 00:22:44,660 It wasn't that bad. 336 00:22:48,260 --> 00:22:50,180 Over the last ten years, 337 00:22:50,180 --> 00:22:52,500 this technique has successfully revealed 338 00:22:52,500 --> 00:22:54,980 the genetic basis of many diseases. 339 00:22:54,980 --> 00:22:57,660 We have got here the coverage and... 340 00:22:57,660 --> 00:23:01,100 Good, OK, well it looks like we've got our gene then, doesn't it? 341 00:23:01,100 --> 00:23:03,220 I hope so. OK. 342 00:23:03,220 --> 00:23:05,780 Being able to identify a disease 343 00:23:05,780 --> 00:23:08,860 is often the first step in helping patients. 344 00:23:08,860 --> 00:23:12,700 So patients live with the uncertainty of a lack of diagnosis 345 00:23:12,700 --> 00:23:14,140 for many, many years 346 00:23:14,140 --> 00:23:18,060 and we can't underestimate the benefits and the importance 347 00:23:18,060 --> 00:23:20,380 of having this diagnosis, 348 00:23:20,380 --> 00:23:23,260 so through molecular testing such as this, 349 00:23:23,260 --> 00:23:27,820 we're able to provide those patients with a certain level of comfort 350 00:23:27,820 --> 00:23:32,380 when it comes to a diagnosis, and, in a sense, closure, 351 00:23:32,380 --> 00:23:34,820 so they can move on to the next chapter. 352 00:23:36,300 --> 00:23:39,540 Teasing out the patterns in the human dataset 353 00:23:39,540 --> 00:23:41,100 is transforming medicine. 354 00:23:52,420 --> 00:23:55,020 Data is becoming a powerful commodity. 355 00:23:57,740 --> 00:23:59,820 It's leading to scientific insights 356 00:23:59,820 --> 00:24:02,540 and new ways of understanding human behaviour. 357 00:24:03,940 --> 00:24:09,780 And data can also make you rich, very rich. 358 00:24:09,780 --> 00:24:11,980 TRADERS SHOUT 359 00:24:16,300 --> 00:24:19,100 When it comes to making money out of data, 360 00:24:19,100 --> 00:24:21,340 David Harding's rather good at it. 361 00:24:21,340 --> 00:24:24,740 30 years ago, he set out 362 00:24:24,740 --> 00:24:26,620 to bring data analysis and algorithms 363 00:24:26,620 --> 00:24:28,460 to the trading floors of the City. 364 00:24:28,460 --> 00:24:33,140 This is how all trading used to be done. 365 00:24:33,140 --> 00:24:37,780 All trading used to be done in rooms full of people like this. 366 00:24:37,780 --> 00:24:40,540 They are shouting the prices they will buy and sell at, 367 00:24:40,540 --> 00:24:44,300 they are agreeing the deals, the rises and falls in the prices 368 00:24:44,300 --> 00:24:47,220 are almost like the rises and falls in the noise level. 369 00:24:54,300 --> 00:24:56,300 Today, the London Metals Exchange 370 00:24:56,300 --> 00:24:58,740 is the only trading pit of its kind in Europe. 371 00:25:01,420 --> 00:25:05,300 Noisy, emotional and chaotic. 372 00:25:09,660 --> 00:25:11,900 To a science graduate from Cambridge, 373 00:25:11,900 --> 00:25:13,780 it came as a bit of a surprise. 374 00:25:17,420 --> 00:25:19,060 When I went into the City, 375 00:25:19,060 --> 00:25:22,860 I assumed because it was the world of banking and high finance, 376 00:25:22,860 --> 00:25:25,860 I assumed that it would all be very, very rational 377 00:25:25,860 --> 00:25:31,020 and very efficient and very disciplined and well-organised, 378 00:25:31,020 --> 00:25:34,340 rather like the body of knowledge 379 00:25:34,340 --> 00:25:37,340 I had been taught at Cambridge in physics and chemistry. 380 00:25:37,340 --> 00:25:40,380 These bodies of knowledge were organised and rational, 381 00:25:40,380 --> 00:25:43,740 and it wasn't at all like I expected. 382 00:25:43,740 --> 00:25:49,100 But that it was, you know, somewhat chaotic, in a way. 383 00:25:49,100 --> 00:25:52,380 Buying and selling strategy in those days 384 00:25:52,380 --> 00:25:57,460 tended to be governed by instinct and intuition. 385 00:25:57,460 --> 00:26:00,900 I watched the prices going up and down on the board up there. 386 00:26:00,900 --> 00:26:03,700 I plotted graphs by hand, standing at the edge 387 00:26:03,700 --> 00:26:05,380 and followed these graphs 388 00:26:05,380 --> 00:26:08,180 and I became convinced that there was a pattern 389 00:26:08,180 --> 00:26:10,420 to the rises and falls in prices. 390 00:26:12,900 --> 00:26:17,340 David Harding wanted to bring mathematics to the problem. 391 00:26:17,340 --> 00:26:20,060 He believed that if he had enough data, 392 00:26:20,060 --> 00:26:24,340 he could predict patterns in the prices and make money, 393 00:26:24,340 --> 00:26:27,860 but the prevailing wisdom was that this was an impossible task. 394 00:26:27,860 --> 00:26:30,020 According to the financial orthodoxy, 395 00:26:30,020 --> 00:26:31,540 the rises and falls in prices 396 00:26:31,540 --> 00:26:33,700 that take place here are completely random. 397 00:26:33,700 --> 00:26:35,220 Nobody can ever predict them, 398 00:26:35,220 --> 00:26:39,060 however clever they are or however much foresight they have. 399 00:26:39,060 --> 00:26:42,660 Essentially, cutting to the chase, 400 00:26:42,660 --> 00:26:46,380 the idea is that you can't beat the market. 401 00:26:46,380 --> 00:26:50,820 Like all data miners, Harding needed two things. 402 00:26:50,820 --> 00:26:52,500 Data, a lot of it, 403 00:26:52,500 --> 00:26:56,300 and computer algorithms to spot the patterns. 404 00:26:56,300 --> 00:27:00,580 In the mid-1980s, the introduction of computers to the City 405 00:27:00,580 --> 00:27:03,860 made data about prices accessible. 406 00:27:03,860 --> 00:27:06,900 Harding had to develop the tools to analyse it. 407 00:27:08,340 --> 00:27:11,060 At that stage in my life, I could program a computer! 408 00:27:11,060 --> 00:27:13,820 HE LAUGHS I could program a computer, 409 00:27:13,820 --> 00:27:16,260 I could read the data from the new exchange, 410 00:27:16,260 --> 00:27:19,300 I could conduct analysis of that data 411 00:27:19,300 --> 00:27:22,620 and that, to me, was rather an elementary thing to do. 412 00:27:22,620 --> 00:27:25,620 I was surprised that other people hadn't done it first. 413 00:27:25,620 --> 00:27:26,820 You'd have thought that, 414 00:27:26,820 --> 00:27:29,700 where all the millions and billions are all sloshing around, 415 00:27:29,700 --> 00:27:33,420 you'd have thought that lots of rational, intelligent people 416 00:27:33,420 --> 00:27:35,860 would have done these sorts of things. 417 00:27:47,340 --> 00:27:50,580 The company David Harding founded 20 years ago 418 00:27:50,580 --> 00:27:55,060 now invests billions of pounds on the basis of data. 419 00:27:55,060 --> 00:27:57,340 That is a lovely dataset you've created, 420 00:27:57,340 --> 00:27:59,300 that's why I was waxing rather lyrical. 421 00:27:59,300 --> 00:28:00,820 You might just find a pattern! 422 00:28:08,180 --> 00:28:10,300 And that's a large dataset. 423 00:28:10,300 --> 00:28:12,740 That's a lot of stocks on a lot of dates. 424 00:28:12,740 --> 00:28:17,180 Harding is now far from the only scientist in the City. 425 00:28:17,180 --> 00:28:19,220 His company alone employs 426 00:28:19,220 --> 00:28:22,860 over 100 scientifically trained data hunters, 427 00:28:22,860 --> 00:28:27,940 from astrophysicists to cosmologists, 428 00:28:27,940 --> 00:28:32,860 to mathematicians and meteorologists. 429 00:28:32,860 --> 00:28:38,140 They've become known as quants. 430 00:28:38,140 --> 00:28:39,540 Well, there's the joke which is, 431 00:28:39,540 --> 00:28:41,260 what do you call a nerd in 20 years' time? 432 00:28:41,260 --> 00:28:42,860 And the answer is "Boss," you know! 433 00:28:42,860 --> 00:28:45,860 It reminds me of Bill Gates 434 00:28:45,860 --> 00:28:47,900 who said at any other point in history 435 00:28:47,900 --> 00:28:50,420 he would have been sabre-toothed tiger food. 436 00:28:53,340 --> 00:28:57,380 His company is built around the idea that if you have enough data 437 00:28:57,380 --> 00:28:59,140 and the expertise to read it, 438 00:28:59,140 --> 00:29:02,660 you can spot trends and links that no-one else has noticed. 439 00:29:05,740 --> 00:29:08,740 He and his analysts can seek out patterns 440 00:29:08,740 --> 00:29:11,540 in anything that is bought and sold. 441 00:29:11,540 --> 00:29:14,500 Take, for example, coffee. 442 00:29:14,500 --> 00:29:17,020 Obviously, they will probably almost certainly 443 00:29:17,020 --> 00:29:18,540 sell less coffee on a Sunday. 444 00:29:18,540 --> 00:29:24,140 Now that's not a revelation, or that they sell more coffee in winter, 445 00:29:24,140 --> 00:29:26,660 because people are indoors more often in winter, 446 00:29:26,660 --> 00:29:30,420 but there is an art or a science or a skill which is using the data 447 00:29:30,420 --> 00:29:32,420 to find out more interesting things 448 00:29:32,420 --> 00:29:34,380 and I'm sure that if my analysts went to work, 449 00:29:34,380 --> 00:29:36,980 we could find out much more interesting things than that. 450 00:29:39,340 --> 00:29:42,700 The process begins with data, collecting any information 451 00:29:42,700 --> 00:29:45,300 that might be relevant to the cost of coffee. 452 00:29:45,300 --> 00:29:48,940 The data, you can't hear it and you can't see it. 453 00:29:48,940 --> 00:29:51,260 You need specialised tools 454 00:29:51,260 --> 00:29:55,220 to interrogate and take decisions about that data 455 00:29:55,220 --> 00:29:57,900 and those tools are not the eye and the ear. 456 00:29:57,900 --> 00:29:59,900 They are the modern computer. 457 00:30:02,220 --> 00:30:04,220 Algorithms can then search the data, 458 00:30:04,220 --> 00:30:06,140 looking for factors that link 459 00:30:06,140 --> 00:30:11,100 to the rises and falls in coffee prices. 460 00:30:11,100 --> 00:30:14,340 The yield of coffee bean harvests for example, 461 00:30:14,340 --> 00:30:16,180 the strengths of the economies 462 00:30:16,180 --> 00:30:18,980 and currencies of coffee-producing countries, 463 00:30:18,980 --> 00:30:22,180 as well as consumer demand for coffee. 464 00:30:22,180 --> 00:30:26,780 In the vast dataset, tiny significant signals appear 465 00:30:26,780 --> 00:30:29,620 and it is these signals which hold the clues 466 00:30:29,620 --> 00:30:31,900 to when to sell and when to buy. 467 00:30:34,900 --> 00:30:36,980 The idea of the exercise is 468 00:30:36,980 --> 00:30:39,860 to read in the data on all the companies around the world, 469 00:30:39,860 --> 00:30:44,540 analyse that data using rigorous scientific methods 470 00:30:44,540 --> 00:30:48,420 and make sensible, rational inferences from that data, 471 00:30:48,420 --> 00:30:51,580 not just take decisions on the basis of human feelings 472 00:30:51,580 --> 00:30:54,980 and how you feel today and what you heard from your friend 473 00:30:54,980 --> 00:30:56,540 and so on and so forth, 474 00:30:56,540 --> 00:30:59,700 but really bringing to bear the scientific method much more. 475 00:30:59,700 --> 00:31:04,140 It's a strange mathematical social science, but science, it is. 476 00:31:09,340 --> 00:31:10,660 Here, they gather data 477 00:31:10,660 --> 00:31:15,300 across hundreds of markets going right back in time. 478 00:31:15,300 --> 00:31:18,500 Daily metal prices from 1910, 479 00:31:18,500 --> 00:31:20,740 food prices dating to the Middle Ages, 480 00:31:20,740 --> 00:31:25,260 and London Stock Exchange prices stretching back to 1690. 481 00:31:27,420 --> 00:31:30,060 And every day, they collect new data 482 00:31:30,060 --> 00:31:33,140 on 28,000 companies across the world. 483 00:31:35,020 --> 00:31:37,140 We have data coming in 484 00:31:37,140 --> 00:31:41,100 almost 24 hours a day for nearly all the markets we trade, 485 00:31:41,100 --> 00:31:43,980 and the last time I looked, we had something like 486 00:31:43,980 --> 00:31:47,220 40 terabytes of data in our database, 487 00:31:47,220 --> 00:31:52,140 and that's the equivalent of about 70 million King James Bibles. 488 00:31:52,140 --> 00:31:55,740 The ambition is that somewhere in this 40 terabytes of data 489 00:31:55,740 --> 00:32:00,620 there are patterns that can be used to predict price rises and falls, 490 00:32:00,620 --> 00:32:04,700 and you don't need to predict price changes with pinpoint accuracy. 491 00:32:04,700 --> 00:32:07,740 The odds just need to be a bit better than even. 492 00:32:07,740 --> 00:32:09,740 If you throw a coin and there's a 50/50 chance 493 00:32:09,740 --> 00:32:11,700 of it landing heads or tails, 494 00:32:11,700 --> 00:32:14,820 then clearly, there's no way of profiting from that. 495 00:32:14,820 --> 00:32:17,260 If however, we had the ability to know 496 00:32:17,260 --> 00:32:19,020 that heads was going to come up 497 00:32:19,020 --> 00:32:21,780 52% of the time or 53% of the time, 498 00:32:21,780 --> 00:32:23,900 then that would be a great investment business. 499 00:32:23,900 --> 00:32:26,260 You should look closer to the data, 500 00:32:26,260 --> 00:32:29,580 then there is something which looks a bit bizarre. 501 00:32:29,580 --> 00:32:31,100 First... 502 00:32:31,100 --> 00:32:34,740 If you have the resources and can make enough investments, 503 00:32:34,740 --> 00:32:40,340 spotting even a tiny variation can lead to large profits. 504 00:32:40,340 --> 00:32:42,340 Over the last 20 years, 505 00:32:42,340 --> 00:32:46,100 this approach has paid handsomely for David Harding. 506 00:32:46,100 --> 00:32:49,620 There's never really a point at which you can relax 507 00:32:49,620 --> 00:32:53,540 and sit back and go, "There, I have proved my point!" 508 00:32:53,540 --> 00:32:58,020 Of course, you know, over the years the ideas have been successful, 509 00:32:58,020 --> 00:32:59,540 the company has grown. 510 00:32:59,540 --> 00:33:02,140 It gives me great pride and satisfaction. 511 00:33:08,460 --> 00:33:12,940 Of course, investing in financial markets remains a gamble. 512 00:33:12,940 --> 00:33:17,420 There is no universal law of finance. 513 00:33:17,420 --> 00:33:23,020 Stock market crashes, recessions, they're clearly not easy to predict. 514 00:33:23,020 --> 00:33:27,220 The patterns in the data are constantly shifting and changing. 515 00:33:27,220 --> 00:33:30,140 There is no one right answer. 516 00:33:30,140 --> 00:33:31,580 Every day, week or month, 517 00:33:31,580 --> 00:33:35,900 you are being proven wrong by having your ideas put to the test, 518 00:33:35,900 --> 00:33:38,700 and that is a gift because it enables you 519 00:33:38,700 --> 00:33:40,300 to maintain a level of humility 520 00:33:40,300 --> 00:33:46,060 that people may, in other situations, lose, 521 00:33:46,060 --> 00:33:50,260 and humility is actually a vital ingredient 522 00:33:50,260 --> 00:33:53,580 of proper scientific investigation. 523 00:33:53,580 --> 00:33:58,260 I think most good scientists tend to be quite humble people. 524 00:34:00,860 --> 00:34:02,900 The world of finance has been changed forever 525 00:34:02,900 --> 00:34:04,420 by the data revolution. 526 00:34:07,180 --> 00:34:10,140 The effects have spilled over into everyday life. 527 00:34:15,900 --> 00:34:19,060 And the data revolution is set to become even more personal. 528 00:34:22,940 --> 00:34:26,660 The fastest growing dataset of all is the one being created by you. 529 00:34:28,900 --> 00:34:33,940 Every time we call, text, search, travel, buy, 530 00:34:33,940 --> 00:34:37,300 we add to the data mountain. 531 00:34:37,300 --> 00:34:40,580 All told, it's growing by 2.5 billion gigabytes every day. 532 00:34:44,300 --> 00:34:46,820 All that data is valuable, 533 00:34:46,820 --> 00:34:52,780 and it's brought out the data hunters, like Mike Baker. 534 00:34:52,780 --> 00:34:55,500 The volume of it, the dynamic nature of the data 535 00:34:55,500 --> 00:34:57,540 is changing how we live our lives 536 00:34:57,540 --> 00:35:01,220 and if you collect this information over millions of people, 537 00:35:01,220 --> 00:35:05,300 you can start to guess what they may be interested in next. 538 00:35:08,100 --> 00:35:11,980 He saw an opportunity to bring the data revolution 539 00:35:11,980 --> 00:35:14,020 to the world of advertising. 540 00:35:14,020 --> 00:35:18,060 Instead of relying on customers seeing a billboard, 541 00:35:18,060 --> 00:35:21,420 it was now possible to beam the adverts directly to them. 542 00:35:23,540 --> 00:35:25,980 We started to look and think about all of the data. 543 00:35:25,980 --> 00:35:29,220 If we collected enough about past behaviour, 544 00:35:29,220 --> 00:35:33,740 could it be predictive in a way that would be useful for a business, 545 00:35:33,740 --> 00:35:35,940 in terms of trying to connect to people? 546 00:35:37,660 --> 00:35:39,380 Mike wanted to mine this data, 547 00:35:39,380 --> 00:35:42,660 to predict what people might want to buy. 548 00:35:46,300 --> 00:35:49,900 His first hurdle was how to search through the vast amount of data 549 00:35:49,900 --> 00:35:51,860 we produce every day 550 00:35:51,860 --> 00:35:56,660 to find the tiny signals of our consumer interest. 551 00:35:56,660 --> 00:35:59,500 I quickly realised that a big part of the problem 552 00:35:59,500 --> 00:36:01,060 was actually the math. 553 00:36:01,060 --> 00:36:03,140 It was clear there were no systems, 554 00:36:03,140 --> 00:36:05,780 not even really mathematical constructs, 555 00:36:05,780 --> 00:36:09,780 where you could capture the information, make sense of it 556 00:36:09,780 --> 00:36:14,900 and then turn around and create actions 557 00:36:14,900 --> 00:36:17,980 across hundreds of millions of people simultaneously. 558 00:36:21,020 --> 00:36:24,420 As if capturing the vast dataset created by mobile computing 559 00:36:24,420 --> 00:36:26,340 wasn't challenge enough, 560 00:36:26,340 --> 00:36:30,860 Mike also wanted to mine it virtually instantaneously. 561 00:36:30,860 --> 00:36:34,500 He wanted to find hints of what people might be want to buy 562 00:36:34,500 --> 00:36:37,780 even before they'd realised it themselves. 563 00:36:37,780 --> 00:36:39,900 He needed to find a collaborator. 564 00:36:50,660 --> 00:36:54,300 The ideal partner for Mike came from a completely different world. 565 00:37:00,500 --> 00:37:04,900 Bill Simmons was an aerospace engineer at MIT. 566 00:37:04,900 --> 00:37:09,740 He was working on one of NASA's most ambitious tasks of all time, 567 00:37:09,740 --> 00:37:11,980 a potential manned mission to Mars. 568 00:37:14,660 --> 00:37:16,940 A mission to Mars is extremely complex, 569 00:37:16,940 --> 00:37:19,220 especially if you include people, 570 00:37:19,220 --> 00:37:21,980 and it gets very hard if you want to bring the people back. 571 00:37:23,980 --> 00:37:26,020 Bill's team started to work out 572 00:37:26,020 --> 00:37:30,820 how to plan all the elements necessary for a manned Mars mission, 573 00:37:30,820 --> 00:37:32,940 and discovered the real problem 574 00:37:32,940 --> 00:37:37,140 was that there were so many different options to choose from. 575 00:37:37,140 --> 00:37:41,460 We found there were about 35 different major decisions, 576 00:37:41,460 --> 00:37:43,940 and many, many, small decisions that follow. 577 00:37:43,940 --> 00:37:48,060 For things like how many crew, what kind of propellant to use, 578 00:37:48,060 --> 00:37:52,060 how many rockets, big ones or small ones, what kind of orbit trajectory? 579 00:37:52,060 --> 00:37:55,300 So you add all those up and all the different possible choices 580 00:37:55,300 --> 00:38:02,340 you can make was 35 billion different possible Mars missions. 581 00:38:02,340 --> 00:38:03,620 And that would have taken, 582 00:38:03,620 --> 00:38:06,060 if we were to go through all 35 billion, 583 00:38:06,060 --> 00:38:09,700 it would have taken infinite time to find one that works. 584 00:38:13,820 --> 00:38:16,500 NASA needed a way to narrow down the possibilities. 585 00:38:20,860 --> 00:38:24,340 Bill turned to decision theory. 586 00:38:24,340 --> 00:38:28,380 It's a complex branch of maths but the principle is the same 587 00:38:28,380 --> 00:38:32,580 as something really quite simple - shopping. 588 00:38:37,220 --> 00:38:41,780 Even buying dinner for two, you've got thousands of decisions to make. 589 00:38:41,780 --> 00:38:45,020 You could take all day. 590 00:38:45,020 --> 00:38:47,500 You could try every food, 591 00:38:47,500 --> 00:38:50,220 and it would take you hundreds of years 592 00:38:50,220 --> 00:38:53,380 to see every combination of apples and, I don't know, 593 00:38:53,380 --> 00:38:58,540 mustard or pears and bananas. 594 00:39:03,460 --> 00:39:07,820 To make it simple, you can apply the principle of decision theory. 595 00:39:10,020 --> 00:39:12,940 You can make decisions about things in many different orders. 596 00:39:12,940 --> 00:39:14,900 If you want to decide what to make for dinner, 597 00:39:14,900 --> 00:39:16,620 you can decide what food you like first 598 00:39:16,620 --> 00:39:19,420 or you can decide what tools you're going to use. 599 00:39:19,420 --> 00:39:22,020 So you could say, "I'm going to cook things with a spatula," 600 00:39:22,020 --> 00:39:25,420 and then you have...it doesn't really narrow things down for you. 601 00:39:28,100 --> 00:39:31,980 The trick is to put your decisions in the right order. 602 00:39:31,980 --> 00:39:34,620 If you take big decisions first, 603 00:39:34,620 --> 00:39:39,140 you eliminate a lot of smaller decisions and speed up the process. 604 00:39:42,180 --> 00:39:45,340 I did bring a plan. I'll show it to you. 605 00:39:45,340 --> 00:39:48,020 This is, um... 606 00:39:48,020 --> 00:39:50,700 I have three different kinds of recipes. 607 00:39:50,700 --> 00:39:54,500 I can either make salmon, a white fish or branzini, 608 00:39:54,500 --> 00:39:56,740 three of my favourite recipes. 609 00:39:56,740 --> 00:40:00,020 If I choose salmon, I'll need mustard and capers and lemon. 610 00:40:00,020 --> 00:40:02,980 If I choose white fish, parsley, eggs and lemon. 611 00:40:02,980 --> 00:40:06,260 And branzini, lemon and rosemary. 612 00:40:06,260 --> 00:40:09,620 So here we are at the seafood section. 613 00:40:09,620 --> 00:40:13,740 Looking around, I see they have some very nice fresh Atlantic salmon 614 00:40:13,740 --> 00:40:16,060 and I think that's what I'll buy. 615 00:40:16,060 --> 00:40:18,580 PROGRAMME-MAKER: You strike me as a very organised guy. 616 00:40:18,580 --> 00:40:21,300 Is that a typical Bill thing to do a list like that? 617 00:40:21,300 --> 00:40:25,300 Yes, this is. You know, studying decision theory, 618 00:40:25,300 --> 00:40:28,420 this is how I think about things. 619 00:40:28,420 --> 00:40:31,460 So now the rest of my plan is set in motion. 620 00:40:31,460 --> 00:40:37,380 All I need to do is buy mustard, capers, lemon and some salad, 621 00:40:37,380 --> 00:40:39,580 and possibly a side dish, if I see something I like. 622 00:40:42,020 --> 00:40:45,700 Decision theory, which works so well on a shopping trip, 623 00:40:45,700 --> 00:40:49,260 can also be applied to the 35 billion decisions 624 00:40:49,260 --> 00:40:50,700 in a manned Mars mission. 625 00:40:52,460 --> 00:40:55,260 If the first decision only had two choices, 626 00:40:55,260 --> 00:40:58,100 you could have two crew or three crew, 627 00:40:58,100 --> 00:41:00,500 if you find after a few more decisions 628 00:41:00,500 --> 00:41:04,340 that two crew is not possible, 629 00:41:04,340 --> 00:41:05,820 it won't work, because you need 630 00:41:05,820 --> 00:41:10,180 at least two people in the lander and one person in orbit, 631 00:41:10,180 --> 00:41:11,900 then you've eliminated essentially, 632 00:41:11,900 --> 00:41:14,740 if you made that decision first, early enough in the process, 633 00:41:14,740 --> 00:41:18,820 you've eliminated half of the permutations you need to look at. 634 00:41:18,820 --> 00:41:21,260 So this increases your speed by half, 635 00:41:21,260 --> 00:41:25,940 and if you continue to use this process over and over again, 636 00:41:25,940 --> 00:41:28,180 you continue to speed up your decision process, 637 00:41:28,180 --> 00:41:32,180 doubling every time, for example, so it becomes exponentially faster. 638 00:41:35,540 --> 00:41:38,220 Bill created a decision-making algorithm 639 00:41:38,220 --> 00:41:40,820 which was able to process information, 640 00:41:40,820 --> 00:41:45,820 putting the decisions that narrowed down the most options first. 641 00:41:47,860 --> 00:41:52,340 The 35 billion decisions fell to just over 1,000. 642 00:41:54,340 --> 00:41:57,340 It was a revolution in the speed of data processing. 643 00:42:02,180 --> 00:42:05,500 Mike Baker realised Bill's decision-making model 644 00:42:05,500 --> 00:42:09,540 was just what he had been looking for. 645 00:42:09,540 --> 00:42:12,020 They joined forces and adapted 646 00:42:12,020 --> 00:42:15,500 Bill's super fast decision-making machine. 647 00:42:15,500 --> 00:42:20,540 Now it scans the billions of bits of data we produce, 648 00:42:20,540 --> 00:42:23,140 quickly finding clues to what we might buy, 649 00:42:23,140 --> 00:42:24,580 then sends a personalised advert 650 00:42:24,580 --> 00:42:28,060 from one of their advertising clients. 651 00:42:28,060 --> 00:42:31,140 We're processing hundreds of thousands of advertisements 652 00:42:31,140 --> 00:42:34,180 per second, potential advertisements, 653 00:42:34,180 --> 00:42:37,220 and determining within 100 milliseconds, 654 00:42:37,220 --> 00:42:40,860 so one tenth of a second, much faster than the blink of an eye, 655 00:42:40,860 --> 00:42:45,100 whether that advertisement is good for any one of our clients. 656 00:42:45,100 --> 00:42:47,740 The models learn what you might be tempted to buy, 657 00:42:47,740 --> 00:42:51,420 and where and when you might buy it. 658 00:42:51,420 --> 00:42:54,620 They all work in concert and they pick up on patterns, 659 00:42:54,620 --> 00:42:57,460 so they see the same anonymised user 660 00:42:57,460 --> 00:43:01,260 triggering similar behaviours over and over again. 661 00:43:01,260 --> 00:43:05,260 The machine learns this is a person who likes Italian food, 662 00:43:05,260 --> 00:43:09,780 interested in Sedans, and likes rock music from the '60s. 663 00:43:09,780 --> 00:43:13,340 The data analysts predicting what you might buy 664 00:43:13,340 --> 00:43:18,060 are creating a world of personalised advertising. 665 00:43:18,060 --> 00:43:20,420 If you choose not to personalise the advertising, 666 00:43:20,420 --> 00:43:21,900 you'll still get advertising. 667 00:43:21,900 --> 00:43:23,940 It's not a choice to have no advertising. 668 00:43:23,940 --> 00:43:26,180 It's just that it'll be less relevant to you 669 00:43:26,180 --> 00:43:28,220 and, you know, potentially more annoying. 670 00:43:28,220 --> 00:43:31,420 We're all familiar with what that's like to see something very annoying. 671 00:43:31,420 --> 00:43:33,460 I saw some today at my house. 672 00:43:33,460 --> 00:43:35,180 I think it was erectile dysfunction. 673 00:43:35,180 --> 00:43:36,860 Totally irrelevant to me! 674 00:43:42,540 --> 00:43:46,660 And advertising is just the start of exploiting our personal data mines. 675 00:43:53,380 --> 00:43:57,660 Even the most insignificant data of everyday life is being mined, 676 00:43:57,660 --> 00:44:00,580 with potentially life-saving consequences. 677 00:44:04,340 --> 00:44:08,020 Cathy Sigona is a retired school principal in San Francisco. 678 00:44:11,900 --> 00:44:15,660 She has a condition called atrial fibrillation, 679 00:44:15,660 --> 00:44:17,940 which makes her heart beat irregularly. 680 00:44:20,780 --> 00:44:23,220 It felt like a big fish in my chest. 681 00:44:23,220 --> 00:44:25,900 And it was one side here, 682 00:44:25,900 --> 00:44:28,540 and then it would just bounce back and forth, 683 00:44:28,540 --> 00:44:32,060 and what can happen is the blood can pool 684 00:44:32,060 --> 00:44:34,820 and that can cause a clot 685 00:44:34,820 --> 00:44:36,300 which then can cause a stroke. 686 00:44:36,300 --> 00:44:42,740 So that's where the real seriousness lies, 687 00:44:42,740 --> 00:44:44,700 is the fact that I could stroke out. 688 00:44:47,220 --> 00:44:50,660 The causes of atrial fibrillation are unknown, 689 00:44:50,660 --> 00:44:53,620 so predicting when episodes may occur is vital. 690 00:44:53,620 --> 00:44:56,580 Hi, Nanette, this is Cathy. 691 00:44:56,580 --> 00:44:59,860 So Cathy is about to take part in a trial. 692 00:44:59,860 --> 00:45:04,260 Her doctor is going to monitor her symptoms using data extracted 693 00:45:04,260 --> 00:45:06,300 from how she uses her mobile phone. 694 00:45:09,020 --> 00:45:14,020 Dr Jeff Olgin is Cathy's cardiologist. 695 00:45:14,020 --> 00:45:15,820 Because the mobile phone has become 696 00:45:15,820 --> 00:45:19,420 such an integral part of people's lives, 697 00:45:19,420 --> 00:45:23,780 it's with them most of the day and most of the time, 698 00:45:23,780 --> 00:45:31,260 so that becomes a very good real-time data collector for them. 699 00:45:31,260 --> 00:45:36,420 Dr Olgin is trialling software that will record Cathy's daily behaviour. 700 00:45:36,420 --> 00:45:41,460 Any changes to her usual routine might indicate she's unwell. 701 00:45:41,460 --> 00:45:43,820 As a really practical, simple example, 702 00:45:43,820 --> 00:45:47,300 let's say you get up and go to work every week day at 7 o'clock. 703 00:45:47,300 --> 00:45:50,180 If all of a sudden that's changed, 704 00:45:50,180 --> 00:45:52,860 we'll notice in a difference in your behavioural pattern 705 00:45:52,860 --> 00:45:55,460 that might trigger us to say, you know, "What's going on?" 706 00:45:55,460 --> 00:45:57,900 And there's lots of fun things that sort of pop up... 707 00:45:57,900 --> 00:46:01,700 Algorithms in the software will search Cathy's data, 708 00:46:01,700 --> 00:46:04,380 and if they find signals of abnormal behaviour, 709 00:46:04,380 --> 00:46:07,420 they will trigger an alert to Dr Olgin. 710 00:46:07,420 --> 00:46:09,820 It could be a life-saver. 711 00:46:09,820 --> 00:46:13,140 Hopefully in relation to atrial fibrillation in particular, 712 00:46:13,140 --> 00:46:16,180 hopefully we will be able to identify behaviours 713 00:46:16,180 --> 00:46:20,620 or behavioural patterns that might predict an episode. 714 00:46:20,620 --> 00:46:25,700 Our personal data trails can be used to peer into our behaviour, 715 00:46:25,700 --> 00:46:27,380 discovering clues to illness. 716 00:46:27,380 --> 00:46:34,060 And so if we can find a cause that we can fix down the road, 717 00:46:34,060 --> 00:46:37,060 and I'm not talking the next couple of weeks, 718 00:46:37,060 --> 00:46:38,940 but in the next couple of years, 719 00:46:38,940 --> 00:46:43,020 that we can start alleviating some of the stresses 720 00:46:43,020 --> 00:46:47,060 that cause me to have atrial fib, I would be extremely pleased. 721 00:46:47,060 --> 00:46:48,660 I have a lot of life left. 722 00:46:57,100 --> 00:46:59,860 The idea of predictive and personalised medicine 723 00:46:59,860 --> 00:47:03,100 is coming closer than ever before. 724 00:47:03,100 --> 00:47:07,340 And it's the data we have from the moment we're conceived 725 00:47:07,340 --> 00:47:09,740 that will make this idea a reality. 726 00:47:14,420 --> 00:47:19,500 Professor Beales' clinic relies on the biggest human dataset of all, 727 00:47:19,500 --> 00:47:20,740 the human genome. 728 00:47:22,580 --> 00:47:25,860 Just 20 years ago, his work would have been all but impossible 729 00:47:25,860 --> 00:47:29,300 but now he can analyse his patients' DNA 730 00:47:29,300 --> 00:47:34,380 to pinpoint the genetic mutations causing disease. 731 00:47:34,380 --> 00:47:37,580 We still have a myriad of diseases, particularly at this hospital, 732 00:47:37,580 --> 00:47:39,260 where there are many, many children 733 00:47:39,260 --> 00:47:43,700 who do not have yet a diagnosis for their often rare condition, 734 00:47:43,700 --> 00:47:47,660 and I think at the moment, one of the things we really need to do 735 00:47:47,660 --> 00:47:53,220 is to be able to sequence as many of these children as possible 736 00:47:53,220 --> 00:47:56,260 so that we can begin to unravel a lot of these mysteries. 737 00:47:59,820 --> 00:48:04,820 Genetic diagnosis has already helped identify new conditions, 738 00:48:04,820 --> 00:48:07,500 allowing doctors to devise new treatments 739 00:48:07,500 --> 00:48:10,860 and research cures that promise to improve our lives. 740 00:48:15,180 --> 00:48:18,620 And so far, we only really understand 741 00:48:18,620 --> 00:48:20,660 about one percent of our genome. 742 00:48:23,540 --> 00:48:27,140 These volumes represent the whole of the human genome, 743 00:48:27,140 --> 00:48:29,620 the coding element of the human genome. 744 00:48:29,620 --> 00:48:33,500 In other words, the sequence of all of the letters 745 00:48:33,500 --> 00:48:37,020 that go to make up a single human being. 746 00:48:37,020 --> 00:48:39,420 This is a huge discovery. 747 00:48:39,420 --> 00:48:41,620 However, it is just the tip of the iceberg. 748 00:48:44,180 --> 00:48:49,340 The medical use of our DNA data is in its infancy. 749 00:48:49,340 --> 00:48:52,700 We're just beginning to glimpse the 99% of the genome 750 00:48:52,700 --> 00:48:55,260 which we used to think was junk, 751 00:48:55,260 --> 00:48:59,380 but now realise is vitally important. 752 00:48:59,380 --> 00:49:04,420 So the 99% of the genome that's left for us to understand 753 00:49:04,420 --> 00:49:07,140 is going to represent a huge task. 754 00:49:07,140 --> 00:49:09,900 There's an enormous amount of information in there 755 00:49:09,900 --> 00:49:12,140 and we have to be able to relearn, 756 00:49:12,140 --> 00:49:15,860 we have to actually be able to develop new tools 757 00:49:15,860 --> 00:49:18,500 to be able to understand the code 758 00:49:18,500 --> 00:49:22,300 that's hidden within that vast chunk of the genome. 759 00:49:25,100 --> 00:49:27,940 But even the huge dataset of the human genome 760 00:49:27,940 --> 00:49:30,180 is dwarfed by the one that has its roots 761 00:49:30,180 --> 00:49:32,460 in the very first data science. 762 00:49:34,820 --> 00:49:36,500 Astronomy. 763 00:49:46,420 --> 00:49:49,940 For centuries, astronomers like Simon Ratcliffe 764 00:49:49,940 --> 00:49:51,460 have been collecting data 765 00:49:51,460 --> 00:49:55,020 from the billions of stars and galaxies in the night sky. 766 00:50:00,620 --> 00:50:03,700 In many ways, astronomy was the first of the natural sciences, 767 00:50:03,700 --> 00:50:05,700 and it was the Babylonians who kicked that off 768 00:50:05,700 --> 00:50:07,940 and they started to notice that it wasn't just random. 769 00:50:07,940 --> 00:50:09,140 There were patterns. 770 00:50:09,140 --> 00:50:11,780 There's certain things in the sky that seem to move over 771 00:50:11,780 --> 00:50:13,980 and they're always fixed, relative to each other. 772 00:50:13,980 --> 00:50:15,060 Those were the stars. 773 00:50:15,060 --> 00:50:17,900 Then they noticed that certain objects in the sky seemed to wander. 774 00:50:17,900 --> 00:50:19,340 That was the planets. 775 00:50:19,340 --> 00:50:22,300 And so what they did, as you do, is you record the movements. 776 00:50:22,300 --> 00:50:23,580 They wrote down this data 777 00:50:23,580 --> 00:50:26,340 and in recording that data over long periods of time, 778 00:50:26,340 --> 00:50:28,620 they were able to tease out the patterns inherent 779 00:50:28,620 --> 00:50:32,900 and that gave the ability to start to understand the universe. 780 00:50:32,900 --> 00:50:35,940 The science of astronomy was founded on data hunting. 781 00:50:42,460 --> 00:50:45,420 Astronomers use the patterns of nature, 782 00:50:45,420 --> 00:50:48,180 the predictability of stars, 783 00:50:48,180 --> 00:50:52,540 to unlock the secrets of the universe. 784 00:50:52,540 --> 00:50:55,420 At the moment, we have the Southern Cross to the left. 785 00:50:55,420 --> 00:50:57,740 We have Scorpio right in ascendance above us. 786 00:51:02,980 --> 00:51:10,140 Scorpio was first identified and named over 5,000 years ago. 787 00:51:10,140 --> 00:51:11,540 And if you look closely, 788 00:51:11,540 --> 00:51:15,980 you can see a bright red star there called the Heart of Scorpio. 789 00:51:15,980 --> 00:51:20,140 That's a star called Antares, which is a super-giant. 790 00:51:20,140 --> 00:51:22,700 Now with more data, scientific equations 791 00:51:22,700 --> 00:51:24,500 and mathematical models, 792 00:51:24,500 --> 00:51:28,340 astronomers can forecast the fate of Antares. 793 00:51:28,340 --> 00:51:31,420 This is a fairly massive star that's getting towards the end of its life. 794 00:51:31,420 --> 00:51:34,340 What's going to happen is it's going to expend its nuclear fuel 795 00:51:34,340 --> 00:51:38,260 and basically collapse in on itself, and then form a black hole. 796 00:51:38,260 --> 00:51:41,500 So, if we look at this night sky, at this epic splendour above us, 797 00:51:41,500 --> 00:51:43,060 you don't just see stars. 798 00:51:43,060 --> 00:51:45,620 You see this kind of potential for discovery. 799 00:51:48,140 --> 00:51:50,500 Astronomers are only just beginning 800 00:51:50,500 --> 00:51:53,740 to unlock the potential of this vast dataset. 801 00:52:06,260 --> 00:52:10,620 Today, astronomers like Simon are using a new set of tools 802 00:52:10,620 --> 00:52:13,860 to mine the eternal dataset of the stars. 803 00:52:15,620 --> 00:52:18,140 And as these tools improve, 804 00:52:18,140 --> 00:52:21,980 they can detect more and more detail in the patterns of the universe. 805 00:52:23,900 --> 00:52:26,660 In some ways, beach-combing for shells is a bit like 806 00:52:26,660 --> 00:52:28,700 great astronomy at the moment. 807 00:52:28,700 --> 00:52:30,660 You know, we have a sort of wide plain, 808 00:52:30,660 --> 00:52:33,180 but we pick the low-hanging fruit. 809 00:52:33,180 --> 00:52:35,140 A big shell like this is pretty easy to pick up. 810 00:52:35,140 --> 00:52:39,460 You know this might be representative of what we could do 50 years ago. 811 00:52:39,460 --> 00:52:41,580 And then we start to get down into smaller stuff, 812 00:52:41,580 --> 00:52:44,620 right down into the sand, into the heart of the matter, 813 00:52:44,620 --> 00:52:46,940 to a point where we can see something deeply hidden 814 00:52:46,940 --> 00:52:49,140 that we're really interested in. 815 00:52:49,140 --> 00:52:51,980 And the key to getting there is the next generation of data, 816 00:52:51,980 --> 00:52:53,500 really big data. 817 00:52:56,740 --> 00:53:01,620 Simon's challenge is to find new unmined data about the universe 818 00:53:01,620 --> 00:53:03,980 that will reveal new discoveries. 819 00:53:06,620 --> 00:53:11,300 His latest project promises to deliver exactly that. 820 00:53:23,100 --> 00:53:26,460 The key to it is a site deep in the Karoo, 821 00:53:26,460 --> 00:53:32,100 a broad semi-desert in South Africa's Northern Cape. 822 00:53:32,100 --> 00:53:35,140 We're about 200-odd kilometres away from Cape Town 823 00:53:35,140 --> 00:53:39,020 and there's still another maybe 500 to go before we get to the site, 824 00:53:39,020 --> 00:53:42,420 and as you can see, it's the road to nowhere, really. 825 00:53:51,380 --> 00:53:53,820 The Cat 7 array of radio telescopes 826 00:53:53,820 --> 00:53:56,100 are listening for electrical signals 827 00:53:56,100 --> 00:53:58,220 that have travelled billions of light years 828 00:53:58,220 --> 00:54:00,940 and are infinitesimally weak. 829 00:54:00,940 --> 00:54:04,380 We need to be really far away from people 830 00:54:04,380 --> 00:54:05,820 and the things that they do 831 00:54:05,820 --> 00:54:09,860 because anything modern really interferes with our observations. 832 00:54:09,860 --> 00:54:13,060 So people, their microwaves, their cell phones, their cars, 833 00:54:13,060 --> 00:54:16,340 all these things really drive us further and further away. 834 00:54:18,380 --> 00:54:20,460 The data Cat 7 has already catalogued 835 00:54:20,460 --> 00:54:24,100 has increased our knowledge of the universe. 836 00:54:24,100 --> 00:54:26,940 We've been imaging neutral hydrogen in our galaxy. 837 00:54:26,940 --> 00:54:29,140 We've been looking at transient events. 838 00:54:29,140 --> 00:54:32,220 We've looked at pulsars, but really, we're limited by data. 839 00:54:32,220 --> 00:54:34,460 We need more data to do better science. 840 00:54:34,460 --> 00:54:37,340 The signals Simon looks for are so small that, 841 00:54:37,340 --> 00:54:41,740 despite a combined detecting area of over 1,000 square metres, 842 00:54:41,740 --> 00:54:43,660 these seven telescopes capture 843 00:54:43,660 --> 00:54:46,020 just two megabits of data per second, 844 00:54:46,020 --> 00:54:49,340 and Simon's ambitions go far beyond that. 845 00:54:51,060 --> 00:54:52,580 I think really understanding 846 00:54:52,580 --> 00:54:54,540 how galaxies came to be the way they are, 847 00:54:54,540 --> 00:54:56,460 you know the evolution of the universe, 848 00:54:56,460 --> 00:54:58,820 I think that's one of the most exciting things 849 00:54:58,820 --> 00:55:01,860 we can anticipate addressing and to really answer the questions 850 00:55:01,860 --> 00:55:04,500 of how did the universe get to be as it is and where is it going? 851 00:55:04,500 --> 00:55:06,340 It's only achievable through big data. 852 00:55:06,340 --> 00:55:08,580 We really need to catalogue the entire universe. 853 00:55:08,580 --> 00:55:10,820 We have to figure out what it was like at every epoch 854 00:55:10,820 --> 00:55:12,780 and that's the only way to really understand 855 00:55:12,780 --> 00:55:14,740 how it evolved and where it's going. 856 00:55:16,700 --> 00:55:18,700 Life in the Karoo is about to change. 857 00:55:20,540 --> 00:55:24,780 These telescopes are set to be joined by more, thousands more. 858 00:55:31,820 --> 00:55:35,100 A new telescope array will fill the valley, 859 00:55:35,100 --> 00:55:38,860 covering a square kilometre, the biggest array in the world. 860 00:55:43,340 --> 00:55:44,860 Over the next ten to fifteen years, 861 00:55:44,860 --> 00:55:47,020 this valley is going to fill up with telescopes. 862 00:55:47,020 --> 00:55:48,220 As far as the eye can see, 863 00:55:48,220 --> 00:55:50,780 you'll see telescopes forming a vast array, 864 00:55:50,780 --> 00:55:53,020 bringing data, siphoning back into the Karoo 865 00:55:53,020 --> 00:55:55,860 where science is going to be done on an unprecedented scale. 866 00:56:14,180 --> 00:56:16,900 Work has now begun on the array. 867 00:56:16,900 --> 00:56:18,860 The new telescopes will receive 868 00:56:18,860 --> 00:56:22,100 30 terabytes of data per second. 869 00:56:22,100 --> 00:56:25,380 It will be the biggest data collector ever built. 870 00:56:27,140 --> 00:56:28,780 We're moving into the regime 871 00:56:28,780 --> 00:56:31,460 of unprecedented amounts of information. 872 00:56:31,460 --> 00:56:34,220 We have to take a step back from the data and think, 873 00:56:34,220 --> 00:56:36,220 "What are we trying to extract from the data? 874 00:56:36,220 --> 00:56:39,100 "What is the information that's actually contained therein?" 875 00:56:39,100 --> 00:56:40,700 And make sure that our tools 876 00:56:40,700 --> 00:56:42,580 and our techniques that we bring to bear 877 00:56:42,580 --> 00:56:44,260 look for the patterns in the data. 878 00:56:44,260 --> 00:56:46,820 This really requires a new breed of astronomers 879 00:56:46,820 --> 00:56:48,420 to see how we're going to change 880 00:56:48,420 --> 00:56:50,500 from where we are now to this next big shift. 881 00:56:52,700 --> 00:56:55,780 Simon Ratcliffe and his team have to develop a way 882 00:56:55,780 --> 00:57:00,780 to attain the important patterns in a huge flood of telescopic data. 883 00:57:00,780 --> 00:57:01,900 If they can do it, 884 00:57:01,900 --> 00:57:05,140 they will discover the greatest secrets of our universe. 885 00:57:05,140 --> 00:57:07,860 So it's pretty easy to get lost in the challenge 886 00:57:07,860 --> 00:57:10,220 and the grand endeavour of the whole thing 887 00:57:10,220 --> 00:57:13,380 and feel, you know, you're kind of the master of the universe, 888 00:57:13,380 --> 00:57:16,740 sucking down and unlocking the secrets out there. 889 00:57:16,740 --> 00:57:18,900 You sort of sit here and think, 890 00:57:18,900 --> 00:57:22,420 "I'm this little small human and what right do I have 891 00:57:22,420 --> 00:57:24,660 "to go and pull these secrets out of the universe? 892 00:57:24,660 --> 00:57:27,380 But that's our task. You know, that's what we're going to do 893 00:57:27,380 --> 00:57:30,340 and I think that this project and these data challenges 894 00:57:30,340 --> 00:57:34,020 really offer us that opportunity to understand fully our universe, 895 00:57:34,020 --> 00:57:35,780 where it came from and where it's going. 896 00:57:50,180 --> 00:57:52,740 The data revolution is transforming our world. 897 00:57:55,740 --> 00:57:59,940 We're devising ever more complex ways of gathering data 898 00:57:59,940 --> 00:58:03,860 and ever more ingenious ways of mining it. 899 00:58:06,100 --> 00:58:10,140 Data is becoming the most valuable commodity of the 21st century. 900 00:58:12,380 --> 00:58:16,580 The world of big data has arrived. 901 00:58:40,020 --> 00:58:43,460 Subtitles by Red Bee Media Ltd