Why?

Let’s get right to the point. Througout the course of education at Brigham Young University I’ve taken a lot of classes that deal with data. Thats what you’d expect when your major is Statistics right? But most of the time the data that we use is being provided for us. We don’t need to go through the process of gathering the data before we can start analyzing it. Yes, in bigger companies that would be a DBA’s job and we wouldn’t need to worry about it. However, I’ve found in my short professional career that you are much more valuable when you know how to get the information yourself from a database. Heck, just understanding how it works will will make your life in statistics SO much easier.

What?

The database we’re going to use is MySQL, mainly because its an open-source solution. This means that we don’t have to pay for it! MySQL has been around since the 90’s, and aside from its open-source nature it is also very scalable, which makes it attractive to growing businesses. MySQL’s developers are also very responsive to their open-source community and are quick to bring new features as well as fix bugs.

How?

Lets get started! The first thing you’ll need to do is download the correct version of MySQL for your machine and operating system. In this tutorial I’ll be going through the installation on MacOS, but you can find the right download you’ll need HERE.

After you’ve downloaded your correct installer, run the installer. The installation will vary depending on which system you are using. For basic use you shouldn’t need to choose any opetions other than the default. At some point during your installation you’ll be asked to create a default user. MAKE SURE YOU REMEMBER THE PASSWORD THAT YOU SET FOR THIS USER! However, if you forget, there are ways to create a new user from the terminal outside of your MySQL server. Once the installation has finished you’ll need to start your local MySQL server. On MacOS it looks like this:

Figure

Once you’ve started your server, connecting to it is easy. Using this command, you’ll enter your username and password and you’ll be in!

Figure

Creating your first database and table

Now that you’re in, go ahead and create your first database using the command CREATE DATABASE your_database_name and hit enter. After you create your database you’ll need to go inside your new database by typing USE your_database_name. After you’ve selected your new database we’ll create a table and fill it with some data. Much like creating a database, the command to create a table is CREATE TABLE your_table_name.

Now that you have a table, lets fill it with some data! If you have a table of data on your machine already you can add it with this command:

Figure

In this example from MySQL’s website, the table they’re inserting the data into is called “pet” and the file is called “pet.txt”. But what if you don’t have any data files to insert? No problem! Here is an example of data being manually entered into the “pet” table:

Figure

Yes, I understand this is alot, but lets look at a quick example that shows how to connect RStudio to your local database and query that database using R. This has alot more peices to it, and we’re not going to go into too much detail, but here’s the basics. First you’ll need to install the RMySQL package in RStudio. Once thats done you’ll need to set up your connection. Here’s an example with a database called “sakila” and username and password “stats”.

Figure

Once you set up that connection you can use a set of built in commands that the RMySQL package has in it. This one will list all the tables in the database you’re connected to, and you just need to pass it your connection variable you just created.

Figure

Thats it! Once you have a local database on your machine you can now learn how to push and pull information to and from this database using python and R and whatever other tools you may use for your job or school. Obviously we’re not going to go into that this time, but MYSQL.COM has so much good information to help you along whenever you get stuck. I would also recommend a good youtube video or two to get some examples.

Thanks for the read, I hope it has been helpful!