Input, control and automation

Wednesday 3 February 2016

Building a Home AI, Part 3: The basics

03:30 Posted by Roxton , No comments
A month in to my New Year's resolution, it's probably time to report on my progress so far. After all, Zuck has.

One month in, I have an application that resembles nothing so much as an extremely rudimentary chatbot. Mervyn can tell the time on request, and learn a few facts about you... and that's about it. Not terribly exciting, but the foundations have now been laid for more interesting things. Let's have a look at how it currently works.

First, the user types or speaks something - right now there's speech recognition, but no grammars or keyword spotting or any of the clever stuff that would be necessary for it to work in the home (as opposed to at the desk). Then, that input is examined to determine which task to run. There are lots of ways to do this, ranging from a simple switch over the input to the kind of deep learning kit that Google is reportedly embedding in its new handsets. For the kind of instruction-driven set-up I want, I can probably get by with normalization and basic pattern-matching; there are some very nice open-source tools available for more advanced NLP and sentiment analysis and so on, but for Mervyn (as it is currently envisaged), they're overkill.

What I need is a system that provides normalization, advanced pattern-matching (including sets, variables, etc.), and support for conversation trees and recursion. For now I'm using SIML, a variant of AIML (the Artificial Intelligence Markup Language developed for Alicebot). It's not without its drawbacks - it carries a number of features I don't need or want, such as EmotionML, and its logic can be frustratingly limited (text comparison, for example) - but it meets the current requirements, is easy to edit, and has a really nice little prototyping tool for rapid development. At some point I'm going to have to abandon it and roll my own solutions, but it does the trick for now.

Anyway, SIML takes the input and passes back a command to Mervyn to run a particular task with certain arguments. Mervyn then runs that task, which may include passing some kind of output to the user.

For really basic things such as telling the time, that's all there is to it. However, for anything more complicated, Mervyn will have to refer to its store of information. For example, if I ask Mervyn whether I'll need an umbrella today, it needs to know:
  • Who I am
  • My schedule for today
  • The physical locations (if only approximate) of each item in the schedule
  • Weather forecasts for each of those locations.
That last item will be pulled from the internet, as might be parts of the penultimate one, but the first two are things I don't really want to ever touch the internet's cloudy appendages. We'll need some kind of database, then.

As a rule I'm more pro- than anti-SQL; I find all the MongoDB-is-webscale nonsense a bit tiresome, but I think this is one of those cases where SQL is not really the right answer. Mervyn needs to store lots of different bits of data, much of it structured differently or even indifferently. What I think I really want is almost an OO-type model, where I can add properties and relationships quickly and on the fly. I haven't worked out the best way of doing this so far - certainly I could hammer it all into SQL, and maybe that will be what happens, but I'd rather not if I can possibly avoid it. I'm looking into the various NoSQL database options at the moment, but for now I've just been using a very loose XML file. I'm still working this out and changing my mind every few minutes, so I shan't say more until I have something more definite. This is probably going to become the single most important aspect of Mervyn - being able to properly handle user data is key to providing a good experience.

So, what next? Here's my to-do list:
  • Settle on a database design so that I don't have to re-write the getVar and setVar calls every day,
  • Continue to add functions
    • The next one should probably be appointments and scheduling
    • I have a horrible feeling that this will involve writing an entire calendar app to avoid storing everything with Google/Facebook etc.
    • Obviously this means that there will have to be some Google or Facebook integration so that events can be pulled or pushed as the user wants, but I might postpone that for now.
    • All this is really so that I can ask Mervyn "what have I got on this weekend?" and have it know. I'm terrible at that stuff and having an external brain that doesn't involve bothering my wife would be very welcome for both of us.
  • Investigate keyword spotting and microphone technologies for home integration (I've heard good things about CMU Sphinx for the former; I'm not so sure about the latter).
  • Try and find a voice synth package that's not completely foul (Google had quite a nice British male voice a few years ago, but it's since disappeared).
  • Write my own version of SIML to parse user input and match it to the requested task.

A fairly chunky list, that. Wish me luck.