edX AWS Analytics Deployment¶
This document describes how to deploy the edX Analytics stack to the Amazon Web Services. We will be using terraform and Ansible to automate deployment where ansible playbooks are available, but some manual setup is still required.
To deploy LMS/CMS, see openEdx AWS deployment and upgrades.
Prerequisites¶
This document assumes a working edxapp setup exists, with an edxapp
MySQL database, ENABLE_OAUTH2_PROVIDER
set to
true
, and tracking logs rotated into an S3 bucket, named something like client-name-tracking-logs
.
To apply the terraform changes, you will need AWS credentials for an IAM User with AdministratorAccess.
We run the analytics ansible playbooks from a separate EC2 director
instance, with the
openedx/configuration repository and its dependencies installed.
General Security Considerations¶
- Store your AWS credentials (key, secret) securely, and delete them when they are no longer required.
- Each set of instances (
analytics
,director
, etc.) should have its own ssh keypair. - Access between resources is granted by Security Groups, e.g. the edxapp RDS can be queried by the analytics EMR instances, because they are members of a Security Group which must be applied to the edxapp database.
- Default Jenkins setup is unprotected and allows anyone to do anything, so never allow external access to Jenkins (even briefly). Close the 8080 port on instance with Jenkins to external world before running ansible script that installs Jenkins. Use SSH tunneling to connect to it from your machine.
Sensitive Data¶
We will add our sensitive data, such as database passwords and key files, to a secure repo created for each client deployment. The files and their expected contents are discussed in subsequent sections.
analytics.pem
- AWS certificate for the analytics instancevars-analytics.yml
- ansible variables used to set up the analytics cluster. These variables can be stored in a separate file, or appended to the basevars.yml
file used for the full edxapp setup.analytics-tasks/
: Analytics-related configuration filesjenkins_env
: environment variables used when running analytics tasks via Jenkinsemr-vars.yml
: extra variables used to provision the EMR clusteranalytics-override.cfg
: configuration for the analytics pipelineedxapp_creds
- contains readonly credentials to be used to access edxapp DBs (edxapp
,ecommerce
, etc.).edxanalytics_creds
- contains read-write credentials to be used to access analytics DBs (analytics-api
,reports
, etc.).
AWS Resources¶
We will use terraform to create the following AWS resources in this setup.
Note: The service links below point to an older version of this documentation which created these services manually. The details may be out of date now, but are provided for reference.
- One EC2 instance for hosting Insights (analytics dashboard), the Analytics API, and Jenkins (analytics scheduler). Alternatively, you may create a separate EC2 for each service, but ensure that they all share a security group.
- Two to five S3 buckets.
- One RDS instance for Insights and Analytics API MySQL databases.
- One ElasticSearch instance.
- EMR clusters are provisioned on a per-task basis.
- Access between resources is controlled by IAM.
Terraform¶
Setup terraform¶
- Install OpenTofu.
- Copy the files in resources/terraform to your client's secure repository.
- Update the variables in variables.tf.
- Set up an AWS profile locally. Use the
aws_profile
name andaws_region
chosen invariables.tf
.
Initial creation¶
- Change directories to where you've stored your terraform
*.tf
files from Setup. - From the terminal, run:
1 2 3 4 5 6
# Downloads the terraform provider and source templates tofu init # Preview the changes that will be made (check these carefully) tofu plan # Apply those changes tofu apply
Upgrading existing resources¶
Follow this procedure to upgrade/replace instances that were created using the initial creation steps. These instructions assume that you're running only one instance, but the same logic can be applied for multiple instances.
- Change directories to where you've stored your terraform
*.tf
files from Setup. - Update
variabls.tf
to temporarily increment theanalytics_number_of_instances
count (to 2). - Run
tofu apply
to create the new EC2 instance. - Provision the new instance using ansible.
Once you're satisfied that the new instance is working, replace the old instance with the new one.
- Update
variables.tf
and decrement theanalytics_number_of_instances
count (back to 1). -
Replace the old analytics instance with the new one in terraform state:
1 2
tofu state rm 'module.analytics.aws_instance.analytics[0]' tofu state mv 'module.analytics.aws_instance.analytics[1]' 'module.analytics.aws_instance.analytics[0]'
-
Update
variables.tf
to increase theanalytics_instance_iteration
counter (this is just for record purposes, doesn't affect functionality). - Run
terraform apply
again to move the new instance into place. - Manually stop/terminate the old EC2 instance once everything is verified OK.
MySQL database¶
The analytics databases and users need to be created manually.
Create analytics databases and user¶
Create dashboard
, analytics-api
, and reports
databases, and analytics
user with password:
From the director instance, run the following command with the root RDS user to launch the mysql shell:
1 2 |
|
Run this SQL in the mysql shell:
1 2 3 4 5 6 7 8 |
|
Store the database credentials in vars-analytics.yml
:
ANALYTICS_MYSQL_HOST
: 'analytics-rds-name.other-stuff.rds.amazonaws.com'ANALYTICS_MYSQL_USER
: 'analytics'ANALYTICS_MYSQL_PASSWORD
: 'analytics_password'ANALYTICS_MYSQL_PORT
: '3306'
and in edxanalytics_creds
.
Create migration user¶
Ansible tasks use common credentials for DB migration, which, by default are set to match edxapp credentials.
The easiest way to do this is to create a user in the analytics database with the same credentials as the edxapp mysql
user. Use this commands to create the user in analytics db (replace edxapp
and <edxapp_password>
with your
actual DB user credentials):
1 2 3 4 5 |
|
Create edxapp read-only user¶
Some analytics tasks import data from edxapp
-series DBs (edxapp
, ecommerce
, etc.). Create a dedicated user with
readonly permissions on the edxapp DB server:
1 2 |
|
1 2 |
|
Store these credentials in edxapp_creds
.
Insights/Analytics API Setup¶
See Insights Setup.
Jenkins Setup¶
See Jenkins Setup.