Having decided to build a cyber-valet, it's probably a good idea to think about what such an assistant would be - which, in the case of something that is entirely servile, is probably synonymous with what it does. Zuck's Facebook post mentions a few things:
- Controlling the lights and temperature in his house
- Watching his daughter’s room for hazards
- Admitting guests based on facial recognition.
The first and last of these are relatively simple goals; so much so that there are already off-the-shelf products that have achieved them. The second is rather more challenging – object recognition is a really tough problem, especially when you’re trying to look for lots of different objects at once – and, as I don’t currently have any children, is something I might put aside for the moment.
Let’s add a few more things our valet should be able to do:
- Run a bath
- Make a pot of tea
- Create, manage and remind me of appointments
- Know my schedule and make recommendations for clothing, travel routes, etc.
- Handle quotidian correspondence
- Play and discover music, videos, etc.
- Draw the curtains
- Watering plants
- Track groceries/household goods
- Know when you’ve put something in the oven and when it’s done
- Answer trivia questions and source quotations
- Resolve intricate domestic dramas
- Read Spinoza.
Maybe not those last two. Not yet, anyway.
What you’ll notice from that little list, which is really just an expanded version of Zuck’s, is that there a lots of different problems to be solved. I think the best way to group these problems is as follows:
- Input
- Verbal commands
- Internet data
- Sensor readings (light, heat, moisture, etc).
- Processing
- Output
- Software
- Booking an appointment online
- Sending an email
- Hardware
- Running a bath
- Making toast
Input problems tend to be either trivial (sensor readings) or fantastically difficult (speech recognition). In both cases I’m happy to let someone else solve the problem for now. In fact, because wiring my entire house with microphones and/or buying and hacking a dozen Amazon Echos is a pretty extreme starting step, I’m going to pass all input to Mervyn (oh, I’m calling it Mervyn for now, because I’m terrible at names) in text form through a console, with various tags (sensor, speech, etc.) to simulate different types of input. The point is that input solutions should really be plug-and-play, and aren’t particularly relevant to the ‘core’ problem of having an AI that can actually understand what it’s receiving.
The same can be said for many of the output problems, particularly those that rely more heavily on hardware. It shouldn’t matter exactly which brand of computer-linked thermostat I have, or whether I’ve built my own; Mervyn should be able to handle it. If the other man can do it better, let him.
The place to start, then, is on that un-expanded ‘Processing’ step in the middle. Which is of course rather obvious. Some of this can be outsourced as well, of course: there’s no need to build a trivia engine when Google search will do it for you; a combination of Tensor Flow and Google Inbox can be used to generate email replies. Mervyn doesn’t need to have a massively deep neural network; it just needs to access them from time to time.
So far I’ve mostly talked about what Mervyn
won’t be. It
will, of course, change over time, but for now I’m anticipating two main components. The first will be a rather messy flowchart – basically a collection of rules and procedures for handling various types of input. For example, if I ask whether or not I need an umbrella, Mervyn should look up where I’m going in the near future, check that against weather reports, and reply.
The second component will be a large, loosely structured database containing all the information that Mervyn needs to do its job. This should include big chains of relationships between items, as well as previous instructions, modifications to those instructions and so on. This database would be the thing that really made Mervyn a personal assistant, and would be different for each installation. Because it would inevitably include a fair amount of personal information, it’s important to state straight away that I would never, ever want this database to be accessible over the internet or shared with anyone. If you want to take Mervyn with you, you can clone it to a memory stick or load it onto your phone. Maybe, if I were absolutely sure I had the encryption nailed down properly, you could have remote access. But there’d never be any kind of corporate exploitation of that data – it would be entirely local and under the control of the user.
So, what next? Let’s start with the little things:
- Build the interface (a basic command line will do for now).
- Accessing, storing and interpreting weather data.
- Using that data to respond to weather queries.
This shouldn’t be particularly difficult (famous last words), and opens up nice avenues for further development – location tracking, scheduling, and so on.