Booking a table at RAW

Feb. 26, 2019, 2:48 p.m.

Ever since I came to Taiwan I've been wanting to try out one of it's most famous' restaurants: RAW. Making a reservation at RAW can be quite hard as seats for two people sell out very quickly. RAW makes use of a reservation platform offered by EZtable.com. Seats for any given day open exactly 2 months before at 12:00PM every day. Recently, I have annoyingly spent multiple efforts to be ready at the computer at this exact time to quickly book a ticket before they sell out. Apparently, not being the only one as every time I was bested by my competitors.

Being a data scientist I feel like this can be solved more easily. I have experience with scraping the web using Beautifulsoup in Python. However, traversing a web page, clicking multiple JavaScript elements, typing information in fields is not possible using Beautifulsoup. Recently, I stumbled upon a tool called Puppeteer: a tool made by Google that lets programmers use Chrome programmatically. As I might make use of this tool in the future for web scraping, I decided to learn Puppeteer and also solve my RAW problem at the same time. In this project a Python port of Puppeteer (written in JavaScript) called Pyppeteer. 

Let’s see what we are up against. We need to:

Select the amount of people. Click through the menu until we in our desired month (2 months in the future) and then click the appropriate day.

Wait, what’s that? Multiple prices? Upon investigation I found that these are different menu’s with different ingredients. In our script we will make sure we can account for our budget. Continuing our steps, we then need to click a time that fits our budget. It would be good to set a preferred time and go down an ordered list of descending preference.
After this we get a pop-up that asks us to agree with the terms and then click another button.

However, we then get a login screen that asks us to verify our phone number through SMS. Apparently this login system doesn’t use passwords but rather sends an SMS for verification. This might be a slight problem if we want to this process to be comepletly automated in a future moment in time… Luckily, Puppeteer allows for resuming a previous session, so we only need to login once and then make sure Puppeteer keeps us logged in by refreshing the page at a certain time interval which hopefully renews the cookies. Let’s take a look at which cookie saves our session:

It turns out that the cookie that stores the login information is valid for a year, great! Initially I planned on writing a script that would refresh the page in the background every hour or few hours, depending on how long the session is valid. Now that we know this we can just load the session days, weeks or even months later, without bothering with the need for keeping it alive.

After logging in we fill out our personal and credit card information (yes, we have to pay in advance) and confirm our booking.

So let’s get on with it.

This tool will have two scripts: login.py and book.py. The former is for logging the first time and verify our phone number. The second script will run at 12:00PM exactly two months before the day of our desired reservation, load the session and book the ticket.

Both scripts will have the following imports:

import asyncio
from pyppeteer import launch
from pyppeteer.errors import NetworkError
from datetime import datetime
import os

Asyncio is a package that comes with Python 3.4+, it allows for asynchronously running commands when a process is idle. When loading multiple web pages this can be quite handy since a client often needs to wait for a server to process a request. During this time the client can do other things (like requesting other web pages in parallel). In our script you will often see the term 'async' or 'await', the first one is to indicate that this function wants to exert async capabilities, the second one is put before a command and lets async know that, if that command is idle, it can go do something else elsewhere. More information about Async can be found here. Puppeteer forces the user to use async. I am not really sure why that is and since we are just surfing a single website I don't think there will be a whole lot of asynchronous processing.

We will also import the NetworkError exception class from Pypetter as a network timeout will trigger this error and we might want to catch it.

if __name__ == "__main__":
    customer_info = {
        'ticket_id': '01',
        'phone_number': '0912345678',
        'name': 'Thomas Braam',
        'email': 'thomas.braam@gmail.com',
        'CC_no': '1337133713371337',
        'CC_exp': '0219',
        'CC_CVC': '555',
    }

    loop = asyncio.get_event_loop()

    connected = False

    while not connected:
        try:
            page = loop.run_until_complete(login(customer_info))
            connected = True
        except NetworkError:
            print("Time out...")
            pass


First we load the customer info as we also have in book.py. During login() we will only use ‘ticket_id’ (to give a unique ID to this session) and ‘phone_number’ for this dictionary. Then, we will instantiate asyncio, followed by a while loop that runs the puppeteer script. The while loop will catch any time-out errors and will keep the script running if we encounter one.

async def login(customer_info):
    browser = await get_browser(customer_info)
    page = await get_page(browser)
    page = await first_login(page, customer_info)
    return(page)


async def get_browser(customer_info):
    if not os.path.exists(os.path.join('session', customer_info['ticket_id'])):
        os.makedirs(os.path.join('session', customer_info['ticket_id']))

    return await launch(headless=False,
                        userDataDir=os.path.join('session', customer_info['ticket_id']))


async def get_page(browser):
    page = await browser.newPage()
    await page.goto("https://en.eztable.com/")
    return page

 

login() will run through the three steps of which the first two are very simple, create the browser and open the web page. For launch() I’m using two parameters: headless allows us to actually create a browser that we can see and see what happens, and userDataDir this stores our session. If we don’t supply this parameter we would get a new session every time. The folder that saved my session is about 60MB. If disk space is an issue then, alternatively, saving and loading only the cookies is also a possibility.

Now, for the last part of our login.py script:

async def first_login(page, customer_info):
    # Click member icon to show login pop-up
    await page.waitForSelector('.header-member')
    await page.click('.header-member')
    await page.waitFor(2000)

    # Fill in phone number and click "Next"
    await page.type('#tel-input-header', customer_info['phone_number'])
    element = await page.Jx(
        '//div[@class="login-form"]//div[@class="btn border-btn primary-btn"]')
    element = element[0]
    await element.click()

    # Wait for SMS verification code and fill in
    await page.waitForXPath('//div[@class="login-form"]//input[@type="tel"]')
    ver_code = input('The code has been send to your phone via SMS.\n'
                     'Please enter the code received: ')
    element = await page.Jx('//div[@class="login-form"]//input[@type="tel"]')
    element = element[0]
    await element.type(ver_code)

    # Click verify.
    element = await page.Jx(
        '//div[@class="login-form"]//div[@class="btn border-btn primary-btn"]')
    element = element[0]
    await element.click()

    return(page)

 

This is where some interesting automation begins: First we use waitForSelector to wait for a specific element to load. The first thing we wait for is '.header-member'. Anyone that knows HTML probably knows that this is a class as is indicated by the leading period. For those who don't know: Similar objects are often categorized by giving it a class, this helps to apply formatting to a high number of elements. An id can be given to uniquely identify single elements. Classes and id's are written in HTML like this:

<p class="paragraph">I am a paragraph</p>
<p id="title">I am the title</p>

Many more of categories of selectors exist, however, only when selecting classes or id's we can easily write them with leading it with a period (for classes) or a number sign (id's).

Back to the script. We now know we are waiting for an element that has 'header-member' as class. How do we know the name of that element? In Chrome we can use 'Inspect' to look at the code an element is produced from. We will use this technique anytime we want to how to target a specific element.

We can see that the class is written as follows:

<div class="header-element header-member header-has-dropdown
            pull-right show-sm-block"></div>

A class cannot have spaces, so when you see that an element's class includes spaces it means that this element is a member of all mentioned classes, five in this case. Again, these classes don't have to be unique to this element, so if I would pick a random class in this list that is not specific I might end up picking the wrong element as .waitForSelector() will just pick the first element it encounters (later, we will see how we can iterate through multiple elements in a list). I found that the 'header-member' class is unique to this element so we can use this one for now.

Next, we click the element using the click() function. We then wait two seconds for the element to load using the waitFor() command. Note: There are multiple ways to wait for an element (like the waitForSelector() function) and waiting a set length of time is probably not the best way, however I could not work out a better way to wait for the next element so I reverted to this ugly but workable solution.

Like click() we can also type(), which accepts the selector (#tel-input-header in this case) and what to type as first and second arguments.

After we type the phone number we then use the Jx() function to create a variable which is later followed by a click() on that element variable. Clicking an element this way is similar to using click on a page variable, however, the syntax for selecting and element in page.click() is limited to a single class or id. On this website unique id's are not being used and classes are often not unique so we want to use a function that can be more specific.

<div class="btn border-btn primary-btn">Submit</div>

As all of these classes are very unspecific we need to find another way of selecting this element. Jx() helps us to write a more complex selection query.

element = await page.Jx(
    '//div[@class="login-form"]//div[@class="btn border-btn primary-btn"]')

What I wrote in this Jx() can be translated as follows: first, look for a <div> that is of class "login-form". Then, in its children look for a <div> that has is a member of all three mentioned classes. The result is a list of elements of which we select the first (there should be only one), followed by a click() on that element.

Further up we see a waitForXPath() which is basically a waitForSelector but accepts a more complex query like Jx(). Here we look for an input field that is of type "tel" (phone number).

So, we get our SMS code, fill it in in our script, we get logged in and, since our session is saved automatically, we can just close the window, concluding login.py

Onto book.py!

We start the script very similar to login.py, only with some references to extra functions and the ticket_info dictionary. The times in ticket_info are ordered so a greater importance is put on the former ones rather than the latter ones.

import asyncio
import re
from pyppeteer import launch
from pyppeteer.errors import NetworkError
from datetime import datetime
import os


async def book(ticket_info):
    browser = await get_browser()
    page = await get_page(browser)
    page, chosen_ticket = await find_and_select_day(page, ticket_info)
    page = await make_reservation(page, customer_info)
    return(page)


async def get_browser():
    browser = await launch(headless=False,
                           userDataDir=os.path.join('session', customer_info['ticket_id']))
    return browser


async def get_page(browser):
    page = await browser.newPage()
    await page.goto("https://wv.eztable.com/widget/raw?locale=en_US")
    return page

...

if __name__ == "__main__":
    ticket_info = {
        'year': '2019',
        'month': 'April',
        'day': '4',
        'time': ['18:15', '18:00', '18:30', '18:45', '19:00',
                 '19:15', '19:30', '19:45', '20:00'],
        'people': '4',
        'min_price': 2000,
        'max_price': 3000}

    customer_info = {
        'ticket_id': '01',
        'phone_number': '0912345678',
        'name': 'Thomas Braam',
        'email': 'thomas.braam@gmail.com',
        'CC_no': '1337133713371337',
        'CC_exp': '0219',
        'CC_CVC': '555',
    }

    loop = asyncio.get_event_loop()

    connected = False
    while not connected:
        try:
            page = loop.run_until_complete(book(ticket_info))
            connected = True
        except NetworkError:
            print("Time out...")
            pass

Then we go on to the web page and select our day and price:

async def find_and_select_day(page, ticket_info):
    await page.waitForSelector('#widget > main > div >div.selector-wrapper > \
                                label > select')

    # Select amount of people
    await page.select('#widget > main > div > div.selector-wrapper > label > \
                       select', ticket_info['people'])

    # Select day
    ticket_date = datetime.strptime("{Y} {B} {d}".format(Y=ticket_info['year'],
                                                         B=ticket_info['month'],
                                                         d=ticket_info['day']),
                                    "%Y %B %d")
    ticket_date_string = ticket_date.strftime('%A, %B %-d, %Y')
    element = []
    await page.waitFor(1000)
    while not element:
        element = await page.Jx(
            '//button[@aria-label="{}"]'.format(ticket_date_string))
        if element:
            continue
        element_next_month = await page.Jx(
            '//button[@aria-label="Move forward to switch to the next month."]')
        await element_next_month[0].click()
        await page.waitFor(1000)
    await element[0].click()

    ...

First we wait for a selector. This selector is acquired by right-clicking on the element in Inspect and going to copy > copy element. This gives a very ugly looking absolute path to the element. As this is a drop-down menu, we then use select() on this element and select the amount of people we want.

We then take the desired time from our ticket_info and format it to the website's formatting and store it in ticket_date_string. In the while loop we then try to select our date on the page. If we are not in the right month, we cannot select it and our element variable will remain as an empty list which triggers us clicking on the button to go to the next month. If we can select the element with our date we click it and we go to the next step which is looking for times and prices:

async def find_and_select_day(page, ticket_info):
    ...

    for i in range(1, len(elements) + 1):
        element = await page.Jx(
            '//div[@class="quota-group"][{}]'.format(i))
        element = await element[0].getProperty('textContent')  # Get price
        price = await element.jsonValue()
        price = int(re.findall(r'\d+,\d+', price)[0].replace(',', ''))

        element = await page.Jx(
            '//div[@class="quota-group"][{}]//li[@class="quota"]'.format(i))
        element_dict[price] = {}
        for j in range(len(element)):
            time = await element[j].getProperty('textContent')  # Get time
            time = await time.jsonValue()
            element_dict[price][time] = (i, j)

    ...

There are multiple <div>'s called "quota-group". If we use Jx() we get a list we can iterate over and extract the price and all the associated times for each one.

To get the text inside a <div> we first use getProperty('textContent') and followed by jsonValue(). We then use a regular expression to get the number into a format that we like (eg. "3000" instead of "$3,000"). The time for each price are all items in a list (<li>) which we also iterate over. Then we put the indexes for every price [i] and time [j] in a dictionary so when, later, we have chosen our preferred time we can easily select it. Next we will finally choose and select our preferred time:

async def find_and_select_day(page, ticket_info):
    ...

    # Choose ticket that is within price range and highest ranked in time.
    chosen_ticket = {}
    for price in sorted(element_dict.keys()):
        if price > ticket_info['min_price'] and price < ticket_info['max_price']:
            for time in ticket_info['time']:
                if time in element_dict[price].keys():
                    chosen_ticket['time'] = time
                    chosen_ticket['price'] = price
                    chosen_ticket['id'] = element_dict[price][time]
    if not chosen_ticket:
        raise ValueError(
            'Could not find appropriate ticket,'
            ' probably no seats available within price range.')

    # Select that ticket
    element = await page.Jx(
        '//div[@class="quota-group"][{}]//li[@class="quota"][{}]'.format(
            chosen_ticket['id'][0],
            chosen_ticket['id'][1],))
    element = element[0]
    await element.click()

    # Click 'I Agree' and 'Confirm'
    await page.waitForSelector('#jamie-checkbox')
    await page.click('#jamie-checkbox')

    element = await page.Jx(
        '//div[@class="hanlai-popup jamie-popup"]//div[@class="button active"]')
    await element[0].click()

    return(page, chosen_ticket)

The chosen ticket is chosen on whether or not it is in our price range, lowest price get priority. Within that price we go down our ordered list of preferred times. We then click that element using Jx() and click a few other things.

In the last part we fill in our information in a straight-forward fashion and we book our ticket.

async def make_reservation(page, customer_info):
    #### First page, customer info
    # Input name
    await page.waitForSelector('.input-group')
    element = await page.Jx('//div[@class="name-input-wrapper"]//input[@class="input"]')
    await element[0].type(customer_info['name'])

    # Input email
    element = await page.Jx('//div[@class="input-group"][3]//input[@class="input"]')
    await element[0].type(customer_info['email'])

    # Click 'Next'
    element = await page.Jx('//div[@class="btn footer-btn secondary-btn"]')
    await element[0].click()

    #### Second page
    # Click 'Next'
    await page.waitForXPath('//div[@class="btn footer-btn secondary-btn"]')
    element = await page.Jx('//div[@class="btn footer-btn secondary-btn"]')
    await element[0].click()

    #### Third page, credit card info
    # Input credit card number
    await page.waitForSelector('.ccNumber')
    await page.type('.ccNumber', customer_info['CC_no'])

    # Input credit card expiry date
    await page.type('.ccExpiry', customer_info['CC_exp'])

    # Input credit card CVC
    await page.type('.ccCVC', customer_info['CC_CVC'])

    # Click 'Next'
    element = await page.Jx('//div[@class="btn footer-btn secondary-btn"]')
    await element[0].click()

    # Click 'Confirm'    
    await page.waitForXPath('//div[@class="btn secondary-btn"]')
    element = await page.Jx('//div[@class="btn secondary-btn"]')
    await element[0].click()

    return(page)

And we're done! So to now get your ticket is to set a scheduler to run book.py and you're set!