{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pprint\n",
"c = ['⌚️', '📱', '💻', '💶']\n",
"print_emission = lambda x: ''.join(c[x_] for x_ in x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class HMM:\n",
" def __init__(self, T, B, pi):\n",
" self.T = np.array(T)\n",
" self.B = np.array(B)\n",
" self.pi = np.array(pi)\n",
"\n",
" def sample(self, sequence_length):\n",
" emissions = list()\n",
" states = list()\n",
" # ...\n",
"\n",
" def _alpha(self, sequence):\n",
" # Init\n",
" # ...\n",
" for idx, s in enumerate(sequence[1:]):\n",
" # Previous probabilities * p(transition) * p(observation)\n",
" # ...\n",
" return alpha\n",
"\n",
" def _beta(self, sequence):\n",
" # Init ...\n",
" for idx, s in enumerate(sequence[:-1][::-1]):\n",
" # ...\n",
" return beta\n",
" \n",
" def state_posteriors(self, sequence):\n",
" # return posteriors\n",
"\n",
" def sequence_likelihood(self, sequence):\n",
" return np.sum(self._alpha(sequence)[-1])\n",
"\n",
" def viterbi(self, sequence):\n",
" # Initial probabilities\n",
" probs = # ...\n",
" stack = []\n",
" for s in sequence[1:]:\n",
" # Total transition probabilities\n",
" # ...\n",
" # Select best previous state for each state\n",
" # ...\n",
" # Include observation probabilities\n",
" # ...\n",
" # Append to stack\n",
"\n",
" # Select highest probability\n",
" state_sequence = # ...\n",
"\n",
" # Backtrack\n",
" while stack:\n",
" # ...\n",
"\n",
" return state_sequence[::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Intro"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"To handle the workload on Christmas, Santa decided to assign the wrapping of the presents to three helpers. He also lets them choose which of the four possible presents [⌚️, 📱, 💻, 💶] they pick. Each of the helpers has his own preferences (described by the emission probability matrix B). Due to space constraints, they cannot work in parallel. So they agreed to switch after each present with a certain probability (described by the transition matrix T) and place all presents in a queue. For each day, Santa gets a batch of 20 wrapped presents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Transition probabilities\n",
"T = [[0.5, 0.25, 0.25],\n",
" [0.1, 0.8, 0.1],\n",
" [0.4, 0.4, 0.2]]\n",
"# Emission probabilities\n",
"B = [[0.9, 0.1/3, 0.1/3, 0.1/3],\n",
" [0.1/2, 0.8, 0.1, 0.1/2],\n",
" [0., 0., 0.5, 0.5]]\n",
"# Start probabilities\n",
"pi = [1, 0, 0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hmm = HMM(T, B, pi)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Rumor has it that the helpers enjoyed the local christmas market much more than their work. So for one day, they forced a gnome to do the packaging for them. However, they did not tell the gnome to include the special advertising card. The batch without these cards has to be opened and rewrapped with the card included. This is a lot of work and Santa wants to open as few presents as possible to find the batch without the cards. He figures, that he might exploit the fact that the gnome has its very own preferences for choosing the presents. Also, the size of the package tells him what kind of present is inside.\n",
"\n",
"Sort the batches in a way, that the least amount of presents has to be opened to find the batch without the cards."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"with open('batches', 'rb') as fid:\n",
" batches = pickle.load(fid)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for idx, b in enumerate(batches):\n",
" print('Day {}: {}'.format(idx, print_emission(b)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now Santa wants to pay the helpers for their work. The helpers agreed to be payed based on the number of presents they wrapped. However, the list with the counts for each day somehow got lost at their latest visit of the christmas market. To solve this problem, Santa decides that the helpers get paid based on the best approximation of the number of presents wrapped given the batches and his model.\n",
"\n",
"Find a way to approximate the number of presents wrapped by each helper."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After hearing of the incident, the union of the helpers demands more transparency and wants to know the probability for each present being wrapped by a specific helper"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Task 4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need more presents! Write a function to sample batches from the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bonus"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course the helpers wrapped more than 20 presents per day. 200 for example seems to be more realistic. Such a long sequence leads to numerical issues which can be resolved by either 1) rescaling or 2) working in the log-domain. Extend the class above to do the calculations in the log-domain."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (old Anaconda 3)",
"language": "python",
"name": "anaconda3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}