View on GitHub

vlainic.github.io

My GitHub blog: things you might be interested, and probably not...

[E.1] AML - Reinforcement Learning assignment typos

I would like to start a new blog series - education [E], as I am still in the process of re-qualification from “pure science” to “data science”.

A few months ago I have finished the fourth course on Coursera in a row from Advanced Machine Learning specialization - Practical Reinforcement Learning.

I will copy my review of the course (mark 5/5): “This is my fourth AML course, and for now I would say it is the best one. It connects lectures and practice in the best way. On the other hand, there are mistakes all around, as it is beta-version. In my opinion, it is not fair to put the beta-version course into paid specialization.”

Yes, you read it right - it is the beta-version. Hence, a few practical tasks (or assignments), had some mistakes where they should not be, i.e. in the part of the code that should not be changed by the student. I will list all of them here and try to make somebodies life easier :)

Week 1 / Assignment 1 - “OpenAI Gym”:

There is a wrong hint: Hint: your action at each step should depend either on t or on s. One should only use t!!!

Week 6 / Assignment 8 - “Bandirs & exploration”:

Change 1 - class BernoulliBandit:

class BernoulliBandit:

def pull(self, action):

line

if np.random.random() > self._probs[action]:

changed to:

if np.any(np.random.random() > self._probs[action]):

i.e. the condition is put under np.any().

Change 2 - def plot_regret:

def plot_regret(scores):

line

plt.legend([agent.name for agent in scores])

changed to

plt.legend([agent for agent in scores])

i.e. agent.name is changed to solely agent.

Change 3 - submission:

Instead of submit_bandits, make new function submit_bandits2 in submit.py:

def submit_bandits2(agents, scores, email, token):
    epsilon_greedy_agent = None
    ucb_agent = None
    thompson_sampling_agent = None
    for agent in agents:
        if "EpsilonGreedyAgent" in agent.name:
            epsilon_greedy_agent = agent.name
        if "UCBAgent" in agent.name:
            ucb_agent = agent.name
        if "ThompsonSamplingAgent" in agent.name:
            thompson_sampling_agent = agent.name
    assert epsilon_greedy_agent is not None
    assert ucb_agent is not None
    assert thompson_sampling_agent is not None
    grader = grading.Grader("VL9tBt7zEeewFg5wtLgZkA")
    grader.set_answer("YQLYE", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[epsilon_greedy_agent[int(5e3) - 1])))
    grader.set_answer("FCHOZ", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[ucb_agent][int(1e4) - 1])))
    grader.set_answer("0JWHl", (int(scores[epsilon_greedy_agent][int(5e3) - 1]) - int(scores[ucb_agent][int(5e3) - 1])))
    grader.set_answer("4rH5M", (int(scores[epsilon_greedy_agent][int(1e4) - 1]) - int(scores[thompson_sampling_agent][int(1e4) - 1])))
    grader.set_answer("TvOqm", (int(scores[epsilon_greedy_agent][int(5e3) - 1]) - int(scores[thompson_sampling_agent][int(5e3) - 1])))
    grader.submit(email, token)

Week 6 / Assignment 9 - “MCTS”:

class WithSnapshots(Wrapper):

def load_snapshot(self,snapshot):

line

self.render(close=True) # close popup windows since we can’t load into them

changed to

self.close()


I hope somebody will find this helpful and please let me know if you think that something else should be here.

The original post is on SteemIT, but I abondened that platform few weeks ago.