Mark Parker

<- Back to home

Building a Smart Home - from Voice Assistant to MajorDom v1.0

Published on May 26th, 2023 · 5 min read

In 2019, I first learned about the possibility of speech recognition and synthesis in Python. Google Assistant, Siri, Cortana, and other assistants were even more limited and helpless than they are now. Adding custom voice commands was out of the question. That's when I became inspired to create my own voice assistant that would rival even Tony Stark's Jarvis.

While working on the core functionality, I started thinking about where to host this assistant. Keeping a laptop constantly powered on was not an option, and I didn't have any other computers available. That's when I discovered the Raspberry Pi single-board computers. I wanted my voice assistant to be able to control lights, LED strips, and curtains. Arduino boards were great for handling such tasks. The only thing left was to find a way to send commands to the Raspberry Pi. I didn't want to use Wi-Fi or Bluetooth right from the start. I found information online about the nRF24L01 modules, gave them a try, and liked the results.

This system worked quite well, but it had two key drawbacks:

  • The range was limited by the sensitivity of the microphone. With a good microphone, everything worked perfectly within a room, but not beyond.
  • For each parameter of each device, I had to add identical voice commands that only differed in the address and message. It was inconvenient, but tolerable for the time being.

To solve the first issue, I added http interface to the voice assistant using Django, which could accept an audio file or a string. In combination with a Kotlin mobile application, I created a wireless microphone, expanding the range to the router's coverage area, meaning from the room to the entire apartment and even slightly beyond. Carrying a phone around the home wasn't always convenient, so a few days later, I developed an application for my Wear OS smartwatch, which turned out to be an incredibly handy solution.

However, I desired more – access to my assistant anytime, not just at home. The simplest solution was a using a Telegram bot as the input-output interface. However, I couldn't shake the feeling that a bot wasn't quite right. I decided to keep it as a temporary solution until I had time to develop something better.

I wanted to be able to use my mobile application to access the assistant remotely. I just needed to find a way to send a request to the local Django server without being on the local network. I was ready to open and forward ports on my router, but my provider didn't give me a public IP address. That's when I tried ngrok. It worked well at first, but in the free version, the server occasionally crashed and changed its address. I quickly dismissed the VPN tunnel option. The cost of a VPS was equivalent to the ngrok subscription, but the implementation was much more complex.

Then I remembered that I had free hosting for PHP websites on Beget and reinvented Long Polling and queues. The implementation was straightforward: the application sent a request to the hosting server, where the PHP code added the request body (JSON) to the end of an array and wrote it to a local file. The Raspberry Pi at home sent a read request to this file every second, and after reading it, cleared the array. This way, I was able to send commands home from anywhere on the planet! Similarly, I set up receiving responses from the assistant: I duplicated the implementation and reversed the roles. Two files and four endpoints on the free PHP hosting provided me with stable bidirectional communication with my home assistant. Shortly after, I taught the assistant to send me messages on its own, such as the room number of the next class at the beginning of each break. Before I could boast about it in college, someone started spamming me at home. I had to add authentication: the login and password were hardcoded in the application, and the server had a check kind of this:

if ($login == 'markparker' && $password == 'MyVeryStrongP@ssw0rd!') {}; 

The application's repositories were private, and the server had no repository (why have a repository for just one file with less than 100 lines?), so the level of security was more than sufficient.

Shortly after, the system gained its first automatic command trigger. With the help of a small feature in my application, I was able to capture an event every time an alarm went off on my phone. This trigger launched the first full scenario: the curtains opened simultaneously, the assistant announced the time, weather, and class schedule at the college. If the room was still dark, the lamp smoothly turned on. At that moment, I felt like a real Tony Stark.

Then I wanted to add more automatic scenarios using motion sensors, presence detectors, lighting sensors, and so on. At this point, the second drawback I mentioned earlier became more noticeable. There was a lot of code duplication, and working with it became less convenient. The project only had the concept of voice commands; there was no notion of devices and triggers. That's when I realized how much my voice assistant had grown: I was already building a full-fledged smart home, not just a question-and-answer assistant.

This realization led me to the decision to separate the voice assistant and make the smart home a standalone project, focusing on device control rather than voice commands. And I decided to do it properly from the start, with a full-fledged server, databases, authentication, and a mobile application. Later, a college professor suggested to use web sockets instead of my php array-to-file workaround. This is how I implement remote control of devices over the Internet.

The overall concept remained unchanged: a hub in the form of a Raspberry Pi single-board computer controls Arduinos via the nRF24L01 radio module. I'll explain the architecture in more detail in the next article.

Comment this post on Twitter

Other articles:

  • AI-powered Mobile App with Backend in Two Days (Tutorial)May 5th, 2024 · 15 min readThis article delves into the nuts and bolts of creating a Proof of Concept (PoC) of a mobile app built with SwiftUI framework and a backend using FastAPI. As an extra, I'll demonstrate effective architecture patterns for SwiftUI apps, specifically MVVMP combined with SOLID principles and Dependency Injection (DI). For android, the code can be easily translated to Kotlin using Jetpack Compose Framework almost without changes.
  • Dr. House — AI Diagnostician in your phone. Passing the Torch and Entrusting a Startup to Capable HandsMay 4th, 2024 · 4 min readThis article picks up where the previous one left off, [How We Built an AI Startup in a Weekend Hackathon in Germany], focusing more on the final product rather than the hackathon process itself.
  • How We Built an AI Startup in a Weekend Hackathon in GermanyMay 4th, 2024 · 9 min readHere's a rundown of my weekend at a Cologne hackathon, where we aimed to start an AI startup in just two days. We went from pitching ideas on Friday night to demoing a working app by Sunday. It involved coding late into the night, figuring out last-minute tech snags, and even putting together a presentation minutes before our demo. As a bonus, I have highlighted a to-do list of the main points for creating a startup.