MATLAB Tutorial

Welcome to the MATLAB tutorial version of Data Science Rosetta Stone. Before beginning this tutorial, please check to make sure you have MATLAB installed.

Note: In MATLAB,

% This is a single line comment.
%{ This is a paragraph
comment %}

Now let’s get started!


1 Reading in Data and Basic Statistical Functions

1.1 Read in the data.

a) Read the data in as a .csv file.

First specify the format of the variables to be read in: %C = character, %d = integer, %f = floating point number.

formatSpec = '%C%C%d%f%f';
student = readtable('class.csv', 'Delimiter', ',', 'Format', formatSpec);
readtable()

b) Read the data in as a .xls file.

MATLAB reads tables from .xlsx formats, which is another version of an Excel file.

student_xlsx = readtable('class.xlsx');
readtable()

c) Read the data in as a .json file.

student_json = jsondecode(fileread('class.json'));
fileread() | jsondecode()

1.2 Find the dimensions of the data set.

disp(size(student));
    19     5
size() | disp()

1.3 Find basic information about the data set.

summary(student);
Variables:

    Name: 19×1 categorical

        Values:

            Alfred     1
            Alice      1
            Barbara    1
            Carol      1
            Henry      1
            James      1
            Jane       1
            Janet      1
            Jeffrey    1
            John       1
            Joyce      1
            Judy       1
            Louise     1
            Mary       1
            Philip     1
            Robert     1
            Ronald     1
            Thomas     1
            William    1

    Sex: 19×1 categorical

        Values:

            F     9
            M    10

    Age: 19×1 int32

        Values:

            Min       11
            Median    13
            Max       16

    Height: 19×1 double

        Values:

            Min       51.3
            Median    62.8
            Max         72

    Weight: 19×1 double

        Values:

            Min       50.5
            Median    99.5
            Max        150
summary()

1.4 Look at the first 5 (last 5) observations.

The ":" operator tells MATLAB to print all columns (variables), while "1:5" indicates to print only the first 5 observations.

disp(student(1:5,:));
     Name      Sex    Age    Height    Weight
    _______    ___    ___    ______    ______

    Alfred     M      14       69      112.5
    Alice      F      13     56.5         84
    Barbara    F      13     65.3         98
    Carol      F      14     62.8      102.5
    Henry      M      14     63.5      102.5
disp()

1.5 Calculate means of numeric variables.

age = table2array(student(:,3));
disp(mean(age));
   13.3158
height = table2array(student(:,4));
disp(mean(height));
   62.3368
weight = table2array(student(:,5));
disp(mean(weight));
  100.0263
table2array() | mean() | disp()

1.6 Compute summary statistics of the data set.

numeric_vars = student(:,{'Age', 'Height', 'Weight'});
statarray = grpstats(numeric_vars, [], {'min', 'median', 'mean', 'max'});
disp(statarray);
           GroupCount    min_Age    median_Age    mean_Age    max_Age    min_Height    median_Height    mean_Height    max_Height    min_Weight    median_Weight    mean_Weight    max_Weight
           __________    _______    __________    ________    _______    __________    _____________    ___________    __________    __________    _____________    ___________    __________

    All    19            11         13            13.316      16         51.3          62.8             62.337         72            50.5          99.5             100.03         150

grpstats() | disp()

1.7 Descriptive statistics functions applied to variables of the data set.

weight = table2array(student(:,5));
disp(std(weight));
   22.7739
disp(sum(weight));
   1.9005e+03
disp(length(weight));
    19
disp(max(weight));
   150
disp(min(weight));
   50.5000
disp(median(weight));
   99.5000
table2array() | std() | sum() | length() | max() | min() | median()

1.8 Produce a one-way table to describe the frequency of a variable.

a) Produce a one-way table of a discrete variable.

tabulate(age);
  Value    Count   Percent
      1        0      0.00%
      2        0      0.00%
      3        0      0.00%
      4        0      0.00%
      5        0      0.00%
      6        0      0.00%
      7        0      0.00%
      8        0      0.00%
      9        0      0.00%
     10        0      0.00%
     11        2     10.53%
     12        5     26.32%
     13        3     15.79%
     14        4     21.05%
     15        4     21.05%
     16        1      5.26%
tabulate()

b) Produce a one-way table of a categorical variable.

sex = table2array(student(:,{'Sex'}));
tabulate(sex);
  Value    Count   Percent
      F        9     47.37%
      M       10     52.63%
table2array() | tabulate()

1.9 Produce a two-way table to describe the frequency of two categorical or discrete variables.

crosstable = varfun(@(x) length(x), student, 'GroupingVariables', {'Age' 'Sex'}, 'InputVariables', {});
disp(crosstable);
    Age    Sex    GroupCount
    ___    ___    __________

    11     F      1
    11     M      1
    12     F      2
    12     M      3
    13     F      2
    13     M      1
    14     F      2
    14     M      2
    15     F      2
    15     M      2
    16     M      1
varfun()

1.10 Select a subset of the data that meets a certain criterion.

% Find the indices of those students who are females, and then get those observations
% from the student data frame.
females = student(student.Sex == 'F',:);
disp(females(1:5,:));
     Name      Sex    Age    Height    Weight
    _______    ___    ___    ______    ______

    Alice      F      13     56.5         84
    Barbara    F      13     65.3         98
    Carol      F      14     62.8      102.5
    Jane       F      12     59.8       84.5
    Janet      F      15     62.5      112.5

1.11 Determine the correlation between two continuous variables.

The first argument of the cat function is dim, which is specified as 2 here to indicate to concatenate column-wise.

height_weight = cat(2,table2array(student(:,4)),table2array(student(:,5)));
disp(corr(height_weight));
    1.0000    0.8778
    0.8778    1.0000
table2array() | cat() | corr()

2 Basic Graphing and Plotting Functions

2.1 Visualize a