What Is AWS Personalize?
“Product recommendation engine” is a general term that can be extended to apply to many things, not just online shopping. Take YouTube for example. If you sign up for a new YouTube account, you’ll get a lot of general videos that appeal to a mass audience—stuff that’s on trending, mostly. However, if you search for “minecraft letsplay,” and watch a half hour video, the YouTube recommendation algorithm will take note of this. It will look at the tags, title, channel, posting date, and other metadata from the video that you liked, and then, using machine learning, will try to find other videos that are similar to it, and had similar engagement from other users. It perhaps you’ll get more videos from the same series, since people will tend to watch things in chronological order. Perhaps there’s another channel making similar content, that you might like as well.
This engine is packaged into a standalone PaaS that doesn’t require any sort of specific machine learning knowledge. You feed the engine user actions (clicked on this post, listened to this song for X minutes, etc) and the engine will spit out new recommendations from your product catalogue when requested. Recommendations may start out a little spotty, but once your model is trained enough, they’ll start to become very accurate.
Setting Up AWS Personalize
Each AWS Personalize project will have three datasets:
Users, which track metadata about the users themselves Items, which functions as a product catalogue Interactions, which log interaction events between users and items
The Interactions list is the one that is most important, as it tracks all events and functions as the basis for training the model. The Users and Items lists provide supplemental data that will help the model make intelligent connections. For example, knowing the age of a user, Personalize can recommend different products to different age groups, based on their likelihood of being applicable.
The default option is to import historical data from a CSV file, though you can use the Event Tracker API to send real-time updates once you get everything going. You’ll need to have some training data to import though—import will fail if you have less than 1,000 entries in your interactions list. If you’re just looking to test out Personalize, you’ll need to create some sort of dummy data that adheres to your schema before proceeding with the import.
Head over to the AWS Personalize Management Console to get started. Create a new dataset group, which will function as an individual “App.” It will ask for a name:
Click next, and you’ll automatically be brought to configure the interactions import. Give it a name (“interactions”), and define your schema. This is in Apache Avro format, and tells Personalize what fields each interaction (or product/user) has. For interactions, the most basic is a bind of USER_ID to PRODUCT_ID, which is used to look up users and products from the other tables (a many-to-many relational link).
You’ll next need to import data into Personalize, from a CSV file in S3. First, select or create a service role that can access this bucket. You’ll also need to attach the following bucket policy to the target bucket to allow Personalize to access it, replacing bucketname with your bucket’s name:
Then you can paste in the path to the file:
Click finish, and you’ll be brought to the datasets panel, where you’ll see that the interactions dataset is now configured. You’ll need to repeat this process twice more, creating datasets for users and products. Everything will probably take a few minutes to import depending on the size of your data.
Once everything is imported, you must create a solution, which is a trained model based off of your data, which can be used as the basis for campaigns which will give actual recommendations. Create one from the dashboard:
Give it a name, and select the recipe you’d like to use to power the solution. You can select this manually, or you can choose “AutoML,” which will use AWS’s HRNN to make predictions. If you’re unsure, select AutoML.
Solutions will have multiple versions to make managing them easier. When you create the solution, the initial version will be created as well.
Once your solution version finishes initializing, you can create a campaign: basically, an instanced inference engine, for getting actual recommendations. It has a REST API endpoint which you can query and use from your application.
From the “Campaigns” tab in the sidebar, create a new campaign, give it a name, and select your solution. Once that’s created, you should be able to test it out from the AWS CLI:
This command will fetch the recommendations from your campaign for the user ID specified. If everything works correctly, you should see a list of item IDs recommended for the user.
To add real time data to the solution, you’ll need to create an Event Tracker from the sidebar. This will give you a tracker ID which you can use to input data.
There are two ways to set this up: if you’re using AWS Amplify, AWS’s web and mobile app backend framework, setup is simple, and you’ll just have to configure it from the Amplify console. If you’re not, you’ll have to set up a Lambda function to process the data and send it to Personalize.