Untitled Publication

Fixing Guake Window Alignment in Multi-Monitor Setups

Aditya Jyoti Paul — Fri, 05 Sep 2025 10:23:08 GMT

If you’ve ever used Guake (the drop-down terminal for GNOME) with a multi-monitor setup, you may have noticed that on the primary monitor the Guake window sometimes overflows beyond the visible screen area — usually extending past the top or right edges.

This makes Guake slightly annoying to use on the main monitor when you have more than one display connected.

The Problem

On single-monitor setups, Guake aligns perfectly.
But with multiple monitors only the left alignment is fine, with overflows on the top and right out of the visible area.

On the primary monitor (monitor == 0), the window width and vertical alignment can push Guake partially off-screen.

The Solution

After digging through the source code, I found the logic that calculates Guake’s window rectangle in utils.py inside the RectCalculator.set_final_window_rect method.

By default, Guake calculates width and height as a percentage of the monitor’s geometry, then applies horizontal and vertical alignment. But there’s no special handling for the primary monitor in multi-monitor setups.

The fix is a simple adjustment right before resizing/moving the window in /usr/lib/python3/dist-packages/guake/utils.py.

# check for multi monitor setups
# if multiple monitors are detected and the primary monitor is used,
# we need to adjust the window width and position

# log.debug("Number of monitors detected: %s", screen.get_n_monitors())
# log.debug("Current monitor: %s", monitor)
# log.debug("Primary monitor: %s", screen.get_primary_monitor())

if screen.get_n_monitors() > 1 and monitor == 0:
    log.debug("Adjustment for Primary monitor in multi-monitor env applied")
    window_rect.width -= 70          # reduce width
    window_rect.y += 34              # increase top gap

That’s it. With this patch:

Guake stays within visible bounds on the primary monitor.
Multi-monitor setups behave correctly without affecting single-monitor mode.

Otherwise the top part is hidden by the activities pane and the right side goes beyond visible area (not sure why, the px calculations seem right but visually it’s not)

The commented log debug lines are meant to provide a jumpstart for more advanced tweaks, for eg if your primary monitor is not 0, you might want to find that or use the variable if it keeps changing etc.

While tesing the fix, the simplest way is to run guake from another terminal and restart it with pkill guake && guake -v, no need to delete __pycache__.

Here’s how it looks with the fix.

Making It Permanent !?

If you edit /usr/lib/python3/dist-packages, your changes will be overwritten when Guake updates via your package manager.

Safer approaches:

Run from source – clone the Guake repo, apply the patch, and launch it directly.
Use a Python venv – install Guake in a virtual environment and patch it there.

I don’t recommend either, changing 4 lines in a python file if the issue reappears seems like the simplest solution to me while using your package manager.

Final Thoughts

Sometimes the best fixes are small. Just a couple of lines in utils.py completely solved the misalignment issue for me in a multi-monitor setup. If you’re facing the same problem, give this tweak a try.

Mastering Monorepos Part 3: Points to consider choosing between Nx and Turborepo

Aditya Jyoti Paul — Tue, 30 Jan 2024 15:49:37 GMT

📚 Use Cases

Nx excels in scenarios where a comprehensive, integrated workspace is preferred, offering simplicity and shared configurations.
Turborepo shines when modularity and parallel development are paramount, catering to projects with diverse, independent packages.

🤔 Which monorepo to Choose

If you want to use the integrated approach and prefer using generators for almost everything to control the build-test-deploy-pipeline at a granular level, you need to choose Nx.
Turborepo is easier to get up and running and is good enough for smaller prjects without a lot of shared code and dependencies.
Learning curve for Nx is considerably steeper than Turborepo. Anecdotally, it took me 15 mins to setup my first Turborepo, and 2.5 hrs with Nx integrated approach.

💡 Some other observations

Free tier for both is good enough, but Nx pricing is more transparent.
Nx Cloud as well as Turborepo support remote and local caching, speeding up local dev immensely.
Currently all Phoenix HQ products like PhoenixTrack, GraFin use Nx integrated and all our blogs use Turborepo package based, CI/CD on products took 40% lesser on average with monorepos than without.
Learning to use Nx generators efficiently is no piece of cake, it'd take a few days to get used to, but the improvements even for a small dev team are worth it.

I put in a lot of love and effort into this monorepos series, please like and share so it reaches more people.

I'd love you sharing your experiences with monorepos in the comments.

Keep building!

Mastering Monorepos Part 2: Nx vs. Turborepo- A Deeper Dive into Package Management! 🚀

Aditya Jyoti Paul — Tue, 30 Jan 2024 15:47:08 GMT

In the vibrant realm of monorepos, Nx and Turborepo stand out for their distinctive approaches to package management. I'll share my experience unravelling the intricacies of integrated and package-based styles in both platforms.

NX supports both integrated and package-based styles. The former integrates projects within a unified workspace, fostering collaboration and shared configurations. TurboRepo adopts a package-based style, emphasizing modularity and independent versioning for optimized parallel development.

🔄 Integrated vs Package-based

🔗 NX: The Integrated Approach (also supports the package-based approach)

Collaboration made easy: NX fosters collaboration by consolidating related projects into a unified workspace.
Shared configurations: Developers benefit from shared configurations, ensuring consistency in tooling, testing, and linting across projects.
Flexibility with granular commands: NX commands seamlessly operate across the entire workspace or specific projects, offering developers flexibility and control.
Holistic NX workspaces: In NX, projects are organized within an "Nx Workspace," encouraging a holistic view of the entire codebase.
Reusable Components with Nx Libraries: Nx Libraries facilitate code sharing, empowering developers to create reusable components, services, and utilities.

📦 Turborepo: The Package-based Approach

Modular: TurboRepo centers around a package-based style, emphasizing modularity to enhance scalability and performance.
Independent Versioning: TurboRepo allows independent versioning for packages, providing granular control over dependencies.
Parallel Development Capabilities: Developers can concurrently work on different packages in TurboRepo, enabling parallel development without unnecessary coupling.
Dynamic workspaces: TurboRepo introduces dynamic workspaces, allowing developers to focus on a specific set of packages without the need for an all-encompassing workspace.
Efficient Build System: TurboRepo optimizes builds by selectively rebuilding affected packages, contributing to faster development cycles.

Also check out the comparison from Nx here.

The integrated style is more convenient for projects which benefit from dependencies being managed centrally from the package.json in the root, whereas other projects which require different versions of the same package would be more suitable for the package-based approach.

It's very easy to switch from the integrated style to the package-based style but significantly harder vice versa.

In the next article of this series, I'll share some use-cases and how I decide on the appropriate solution.
Comment and let me know which monorepo solution you prefer and why.
Please like and share so it reaches more people.

Useful links:

Learn more about monorepos on monorepo.tools.
Check out nx.dev and turbo.build.

Mastering Monorepos Part 1: Pros, Cons, Top Solutions

Aditya Jyoti Paul — Tue, 30 Jan 2024 15:05:56 GMT

Monorepos have become a hot topic in the software development community, revolutionizing the way teams manage their codebases. Let's dive into the pros and cons and explore some popular solutions!

Pros

Code Sharing: Monorepos enable seamless code sharing across projects.
Consistent Builds: Centralized configuration ensures consistent builds.
Easier Refactoring: Changes across projects are easier to manage.
Atomic Commits: Facilitates atomic commits, enhancing version control.
Streamlined CI/CD: Unified pipelines simplify continuous integration and deployment.
Unified Builds: Don't build the same thing more than once.

Cons

Learning Curve: Adopting monorepos may require a learning curve.
Increased Repository Size: Larger repositories may impact cloning and storage.
Tooling Complexity: Managing monorepos often involves sophisticated tooling.
Dependency Management: Dependencies can be challenging to handle.
Risk of Coupling: Close relationships between projects may lead to tight coupling.

Monorepo Solutions

Bazel: Focuses on build and test automation for large, multi-language projects from Google.
Lage: A modern and extensible monorepo build system for TypeScript projects.
Moon: Designed for simplicity and scalability, making monorepo management a breeze.
NX: Ideal for teams working with Angular, React, and Node.js. NX provides powerful tooling, extensibility, and integration with popular frameworks.
Pants: A scalable and flexible build system for monorepos, supporting multiple languages.
Rush: Developed by Microsoft, designed for TypeScript projects.
TurboRepo: Known for its efficient handling of codebases, TurboRepo offers optimization benefits in large-scale projects.

Highlighting NX and TurboRepo

🚀 NX: Ideal for teams working with Angular, React, and Node.js. NX provides powerful tooling, extensibility, and integration with popular frameworks. Its focus on developer experience and comprehensive toolset makes it a solid choice for managing monorepos of various sizes.

🔍 TurboRepo: Recognized for its efficient management of codebases, TurboRepo offers optimization features for handling large projects effectively. TurboRepo stands out for its simplicity and ease, especially if you're in the Vercel ecosystem. 💻

We at Phoenix HQ chose Nx as our monorepo solution (in the integrated style), along with eslint/tsconfig setup w/ NextJS, and it has proved to be gamechanging. I've continue to use Turborepo in some personal projects, they only support the package-based approach afaik, and it is super convenient to set up and use.

In a part 2 of this series, I'd compare Nx and Turborepo, going into more detail on how the package-based and integrated styles differ, how it impacts development.

Useful links:

Learn more about monorepos on monorepo.tools.
Check out nx.dev and turbo.build.

AmzTrack: Track Amazon Prices with Python

Aditya Jyoti Paul — Sat, 29 Jul 2023 11:59:39 GMT

Hello there, fellow savvy shoppers and Pythonistas! If you've ever wanted to keep an eye on the price of a product on Amazon without manually refreshing the page every five minutes, this blog post is for you. Today, I'll share with you a simple Python script that I created, which will allow you to track Amazon prices right from your laptop or server, regardless of the operating system you use.

And the best part? It's free and easy to use.

So, let's get started!

Part 1: Meet your new friend, @BotFather

Before we dive into the code, you'll need a Telegram Bot token if you want to get constant price updates in Telegram. Don't worry; getting one is simple. Go to your Telegram app and search for a bot named @BotFather. It will generate a new bot and provide you with a token. Keep this token safe; you will need it to interact with your bot. Replace in the script with your token.

Part 2: The Price Tracking Alternatives

There are a few websites out there like pricehistoryapp.com and pricebefore.com that track prices. However, these services aren't always reliable. They might not update in real time, they may only work with certain products, or they could stop functioning if the website changes its layout.

Web scraping services like webscraping.ai can provide more precise and customizable results, but they also come with a cost. If all you need is a simple tracker, it might not be worth paying for these services.

Part 3: The Art of Web Scraping

Here's where things get interesting. To fetch the prices, our script uses a method called "web scraping", which is a way to extract data from websites. Amazon does not like this and uses techniques such as IP blocking and CAPTCHA challenges to prevent it. Using a residential IP address can help bypass some of these obstacles, but be aware that excessive or inappropriate scraping can still lead to your IP getting blocked.

So, remember, with great power comes great responsibility. Use your scraping powers wisely!

Part 4: See the Magic

Install Python and the necessary libraries: First, you need to have Python installed on your computer. If you don't have it, download it from here. Once Python is installed, open your terminal or command prompt and type the following commands to install the necessary Python libraries: pip install requests beautifulsoup4 urllib3==2.0.4.

You'd also want pip install fastapi uvicorn if you want to run it as a server.
Get your Telegram Bot Token and Chat ID: Follow the steps described in the blog post. Once you have these, keep them handy as you will need to put them into your Python script.
Get the Product URL: Go to the Amazon product page you want to track and copy its URL.
Get Telegram chat_id and optionally Amazon browser cookies:

For chat_id, start by sending your bot a message in Telegram. Then, run the provided script with your bot token. This will return your chat_id in the response.
```
 import requests
 TOKEN = 
 url = f"https://api.telegram.org/bot{TOKEN}/getUpdates"
 print(requests.get(url).json())
```
For cookies, it's optional even if you're using the server implementation. You can obtain them by opening the product page in your local Chrome browser for example, copying the cookies, and formatting them into a Python dictionary. Replace the cookie dictionary in the script with your own.
Tip: I selected all the cookies and asked GPT to format it into a Python dict. 😉
Edit the Python script: Now you need to replace some placeholders in the script with the values you just collected:
- Replace with your Telegram Bot Token.
- Replace with your chat_id.
- Replace with the product URL.
- Replace with the cookies dictionary you have collected (optional)
- Set the TARGET variable to the price point at which you want to be alerted.
Run the script: Save the script as a .py file (for example, base.py). Now, go to your terminal or command prompt, navigate to the directory where you saved the script and type: python base.py and hit Enter. The script should now run and send you updates on Telegram every 2 minutes!

Remember, this script will stop running if you close the terminal or shut down your computer. If you want the script to run continuously, you will have to set up your computer to prevent it from sleeping, or run the script on a server.

If you're on a Mac, use these commands to disable sleep mode and run the script as a continuous process:

sudo pmset -b sleep 0; sudo pmset -b disablesleep 1
nohup caffeinate python base.py &

When you're finished, re-enable sleep mode with:

sudo pmset -b sleep 5; sudo pmset -b disablesleep 0

You can do this similarly on Linux and disable sleep and network sleep from Power Management in Windows.

And there you have it! You're now set to never miss a price drop on your favorite Amazon product again. Happy shopping and happy coding!

Part 5: Complete Code with Brief Explanation

I've created two Python scripts for you. One is a base script that you can run from any device on your home network. The other is a FastAPI script if you want to run it as a server process. Here's what they do:

Base Script (You Probably only need this)

First, the script uses a requests.get call to fetch the HTML content of the Amazon product page. We send along headers that mimic a browser to prevent getting blocked by Amazon. It then uses BeautifulSoup to parse the HTML and extract the prices from specific tags.

The script checks two prices for verification. If the prices match, it sends a message to a specified Telegram chat with the current price. If the prices do not match, it sends an alert. If the price drops below a certain target, it sends a different alert.

## Base script

import requests
from bs4 import BeautifulSoup
from time import sleep

TOKEN = 
CHAT_ID = 
TELEGRAM_API_URL = f"https://api.telegram.org/bot{TOKEN}/sendMessage"
PRODUCT_URL = "https://www.amazon.in/POCO-Pro-Yellow-128GB-Storage/dp/B0B6GDLMQK/ref=sr_1_1?crid=3EB9ZPVILWI2J&keywords=poco+x4+pro+5g&qid=1690330911&sprefix=poco+x4+pro%2Caps%2C243&sr=8-1"
TARGET = 17000  #INSERT YOUR TARGET PRICE HERE

def get_prices(url=PRODUCT_URL):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

    input_tag = soup.find('input', {'id': 'attach-base-product-price'})
    price_primary = float(input_tag['value'])

    price_tag = soup.find('span', {'class': 'a-price-whole'})
    price_whole =float(price_tag.text.replace(',',''))
    return price_primary, price_whole

def send_telegram_message(message, chat_id = CHAT_ID):
    """Sends a message to a specified chat in Telegram."""
    data = {
        'chat_id': chat_id,
        'text': message
    }
    response = requests.post(TELEGRAM_API_URL, data=data)
    return response.json()

def main(args=None):
    """The main routine."""
    price_primary, price_whole = get_prices()
    if price_primary == price_whole:
        message = f'Price of product is {price_primary}'
        if price_primary <= TARGET:
            message = f'ALERT PRICE DROP Price of product is {price_primary}'
    else:
        message = f'Price mismatch for product between {price_primary} and {price_whole}'
        if price_primary <= TARGET or price_whole <= TARGET:
            message = f'ALERT PRICE DROP WITH MISMATCH Price of product is {min(price_primary,price_whole)}'
    send_telegram_message(message)

if __name__ == "__main__":
    while True:
        main()
        sleep(60*5)

Server Script

The server script does the same thing, but it's designed to run as a FastAPI process. This script uses cookies to maintain a session, which you will need to replace them with your own. The check will happen each time the check_price function runs which happens when the / route is hit.

from fastapi import FastAPI
import requests
from bs4 import BeautifulSoup

app = FastAPI()

TOKEN = 
CHAT_ID = 
TELEGRAM_API_URL = f"https://api.telegram.org/bot{TOKEN}/sendMessage"
PRODUCT_URL = "https://www.amazon.in/POCO-Pro-Yellow-128GB-Storage/dp/B0B6GDLMQK/ref=sr_1_1?crid=3EB9ZPVILWI2J&keywords=poco+x4+pro+5g&qid=1690330911&sprefix=poco+x4+pro%2Caps%2C243&sr=8-1"
TARGET = 17000  #INSERT YOUR TARGET PRICE HERE

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) \
        Gecko/20100101 Firefox/91.0"
    }

cookies = {
    "csm-hit": "tb:s-CXXNPATX9J2QGP81Q|1690349188202&t:169034999&adb:adblk_no",
    "session-token": "W842UvQcRIpJHNyfBrqtjds14fUwP7OoBTeIDrLCLpKks+uFGi6hZrn7H8DGUC8IJTdsfQsq6jgGzhGXho0oiA0TbhE9BHUAs8ZlJ2YsuUA4CMLLb6csSo/xAUwULgrLE6fVdZf/fwhHbBMt+XpuKnBoSSRHYe+VnWgrCxeN8cx7VXBD5gNf1+DbPnpcvF53DFaOBg+Zj0QN6KJsvZCsVdR/4pXJHllCvr0Y=",
    "ubid-acbin": "247-6048695-6105069",
    "i18n-prefs": "INR",
    "lc-acbin": "en_IN",
    "session-id-time": "2083786201l",
    "session-id": "253-7064393-874941",
}

def get_prices(url=PRODUCT_URL):

    sess = requests.Session()
    sess.headers.update(headers)
    sess.cookies.update(cookies)
    response = sess.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    input_tag = soup.find("input", {"id": "attach-base-product-price"})
    price_primary = float(input_tag["value"])

    price_tag = soup.find("span", {"class": "a-price-whole"})
    price_whole = float(price_tag.text.replace(",", ""))
    return price_primary, price_whole

def send_telegram_message(message, chat_id=CHAT_ID):
    """Sends a message to a specified chat in Telegram."""
    data = {"chat_id": chat_id, "text": message}
    response = requests.post(TELEGRAM_API_URL, data=data, timeout=20)
    return response.json()

@app.get("/")
def check_price():
    message = ''
    try:
        price_primary, price_whole = get_prices()
        if price_primary == price_whole:
            message = f"Price of product is {price_primary}"
            if price_primary <= TARGET:
                message = f"ALERT PRICE DROP Price of product is {price_primary}"
        else:
            message = (
                f"Price mismatch for product between {price_primary} and {price_whole}"
            )
            if price_primary <= TARGET or price_whole <= TARGET:
                message = f"ALERT PRICE DROP WITH MISMATCH Price of product is {min(price_primary,price_whole)}"
    except Exception as e:
        message = f"Error: {str(e)}"
    finally:
        send_telegram_message(message)
        return {"message": message}

Part 6: Detailed Code Walkthrough

Here we walk through the code line by line, feel free to skip this if you're not looking to add features to the tool. But if you're looking to customize this even further, the sky is the limit, let's delve into it.

import requests
from bs4 import BeautifulSoup
from time import sleep

Here, we're importing necessary Python libraries. requests allows us to send HTTP requests, BeautifulSoup is used for pulling data out of HTML and XML files, and sleep from time will help us to pause the execution of the script for a specified amount of time.

TOKEN = 
CHAT_ID = 
TELEGRAM_API_URL = f"https://api.telegram.org/bot{TOKEN}/sendMessage"
PRODUCT_URL = "https://www.amazon.in/POCO-Pro-Yellow-128GB-Storage/dp/B0B6GDLMQK/ref=sr_1_1?crid=3EB9ZPVILWI2J&keywords=poco+x4+pro+5g&qid=1690330911&sprefix=poco+x4+pro%2Caps%2C243&sr=8-1"
TARGET = 17000  #INSERT YOUR TARGET PRICE HERE

In these lines, we're setting up our variables. TOKEN is your Telegram Bot Token, TELEGRAM_API_URL is the URL for the Telegram API for sending messages, PRODUCT_URL is the URL for the Amazon product you want to track, and TARGET is the price point at which you want to be alerted.

def get_prices(url=PRODUCT_URL):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"}
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.content, "html.parser")

This is the start of the get_prices function. We define a headers dictionary with a User-Agent to mimic a real browser. We then send a GET request to the product URL and parse the HTML content with BeautifulSoup.

input_tag = soup.find('input', {'id': 'attach-base-product-price'})
price_primary = float(input_tag['value'])

Here, we're looking for an HTML 'input' tag with the ID 'attach-base-product-price'. This is where Amazon stores the base product price. We retrieve it and convert it to a float.

price_tag = soup.find('span', {'class': 'a-price-whole'})
price_whole =float(price_tag.text.replace(',',''))
return price_primary, price_whole

Next, we're finding the span tag with the class 'a-price-whole' which contains the displayed price on the product page. We're extracting the text, removing any commas, converting it to a float and returning both prices.

def send_telegram_message(message, chat_id = CHAT_ID):
    """Sends a message to a specified chat in Telegram."""
    data = {
        'chat_id': chat_id,
        'text': message
    }
    response = requests.post(TELEGRAM_API_URL, data=data)
    return response.json()

This is the send_telegram_message function. It takes a message and a chat_id as inputs, creates a dictionary with these values, and sends a POST request to the Telegram API to send the message. We send the messages by calling send_telegram_message(message)

def main(args=None):
    """The main routine."""
    price_primary, price_whole = get_prices()
    if price_primary == price_whole:
        message = f'Price of product is {price_primary}'
        if price_primary <= TARGET:
            message = f'ALERT PRICE DROP Price of product is {price_primary}'
    else:
        message = f'Price mismatch for product between {price_primary} and {price_whole}'
        if price_primary <= TARGET or price_whole <= TARGET:
            message = f'ALERT PRICE DROP WITH MISMATCH Price of product is {min(price_primary,price_whole)}'

In the main function, we call the get_prices function to get the two price values.

We're checking if the base product price and displayed price are the same. If they are, we're creating a message with the price, and if it is less than or equal to the target, we're creating an alert message. If the prices are not the same, we're creating a message to notify about the mismatch and creating an alert message if either price is less than or equal to the target. You can customize the messages in the if-else block as you wish.

if __name__ == "__main__":
    while True:
        main()
        sleep(60*5)

Finally, if this script is being run directly (and not being imported as a module), we're calling the main function in an infinite loop, with a pause of 5 minutes (60*5) between each call. This will keep checking the prices and sending updates to Telegram. You can also remove the infinite loop and just call the main function, handling the scheduling in a Cron Job, if you'd prefer, that'd also save a bit of OS resources.

Conclusion

And voila! You've just created your very own Python-based Amazon price tracker. With the power of Python and a little bit of your time, you've built a tool that can save you both time and money. No longer do you need to manually keep track of fluctuating prices or pay for pricey tracking services. Now, you can have updates delivered right to you via Telegram.

Whether you're a bargain hunter or just love tinkering with Python, this script gives you a new tool in your arsenal. Plus, you've learned valuable skills in web scraping, working with APIs, and scripting with Python. The best part? You can customize this script to fit your specific needs, whether it's tracking prices on different websites or setting different price thresholds.

I was able to help my girlfriend buy a mobile with this. See how the price jumps up at the end, that happened right after we made the purchase. We had set an alert at 17k, and got a deal within 2 days of running AmzTrack.

Remember, this script is a starting point, and the sky is the limit when it comes to customizing it. Want to track multiple items at once or receive alerts via different messaging platforms or on-call? Go ahead and modify it to your heart's content. After all, that's the beauty of coding.

So, start saving money today with AmzTrack. It's a game-changer for online shopping. As always, happy coding and happy shopping!

What to expect in a management interview?

Aditya Jyoti Paul — Fri, 24 Mar 2023 15:22:43 GMT

When it comes to a management interview, the expectations are high, and the interview process can be very thorough. The employer will be looking for the right combination of skills, experience, and personality traits to ensure that the candidate can effectively lead and manage a team.

In this article, I'll cover some general things to expect in an interview, key areas of preparation and some sample questions and answers. Since every management interview is different, the article will be generally applicable to most individuals but for more personalized guidance and support, you can book some time with me here.

Here are some things to expect in a management interview.

Discussion of Management Style: One of the primary things that an interviewer will want to know is your management style. The interviewer may ask questions about how you have managed teams in the past, how you would handle different situations, and what your strengths and weaknesses are as a manager. The interviewer will also want to know if your management style aligns with the company culture.
Questions about Leadership: Leadership is a crucial aspect of management, and the interviewer will want to know if you have the necessary skills to lead a team effectively. The interviewer may ask questions about your experience in leading teams, how you motivate your team, and how you handle conflicts within the team. They may also ask about your approach to delegation and decision-making.
Experience and Education: The interviewer will want to know about your experience and education. They may ask questions about your previous roles, the size of the teams you have managed, and the type of projects you have overseen. The interviewer may also ask about any management-related education or training that you have completed.
Problem-Solving Skills: Managers are often required to solve complex problems, and the interviewer will want to know if you have the necessary problem-solving skills. The interviewer may ask hypothetical questions that require you to come up with creative solutions to challenging problems.
Communication Skills: Effective communication is a critical aspect of management, and the interviewer will want to know if you have the necessary communication skills. The interviewer may ask questions about your approach to communication, how you handle difficult conversations, and how you communicate with team members of different backgrounds and skill levels.
Personal Attributes: In addition to technical skills and experience, the interviewer will also be interested in your personal attributes. They may ask questions about your ability to work under pressure, your ability to adapt to changing situations, and your ability to handle feedback.

Here are some general tips for preparing for a management interview:

Research the company: Study the company’s history, mission, values, products or services, and culture. Familiarize yourself with its recent news and developments.
Review the job description: Read the job description carefully and make a list of the skills, qualifications, and responsibilities required for the position. Think about how your experience and skills align with those listed.
Prepare your examples: Prepare specific examples from your past experiences that demonstrate your skills and qualifications. Think about examples that show how you solved a problem, motivated a team, or achieved a goal.
Practice answering common interview questions: Prepare to answer questions about your leadership style, conflict resolution skills, problem-solving abilities, and experience managing teams.
Be confident and personable: During the interview, be confident and personable. Remember to listen carefully to the interviewer and provide thoughtful responses.

Here are some specific questions you might encounter in a management interview:

How do you manage conflicts within your team?
Can you tell me about a time when you had to make a difficult decision?
How do you motivate your team to achieve their goals?
How do you prioritize and manage your workload as a manager?
Can you describe your leadership style and how it has helped you achieve success?
How do you handle underperforming team members?
How do you stay up to date on industry trends and best practices?
Can you provide an example of a project you managed and how you achieved success?
Can you describe a time when you had to work with a difficult stakeholder or client?
How do you foster a positive and inclusive work culture within your team?

Take some time to ponder on these questions, there's no right answer to these questions as they're open-ended but there absolutely are wrong answers that you need to steer clear of.

Below I've shared some sample answers to showcase the style in which you can answer these questions but make sure you personalize the answers and are truthful to your own management style since the goal of these questions is to know the real you.

How do you manage conflicts within your team? When conflicts arise within my team, I first ensure that all parties are heard and their concerns are addressed. I encourage open communication and respect for one another's perspectives. Then, I work collaboratively with my team to find a solution that satisfies all parties involved and aligns with the company's goals.
Can you tell me about a time when you had to make a difficult decision? When faced with a difficult decision, I gather all relevant information and perspectives, evaluate the pros and cons, and consider the potential impact on the team and the company. I weigh all factors carefully and make the decision that aligns best with the company's values and goals.
How do you motivate your team to achieve their goals? I believe in leading by example and setting clear goals and expectations for my team. I encourage them to take ownership of their work and provide regular feedback and recognition for their achievements. I also create opportunities for growth and development, and empower them to make decisions and take risks.
How do you prioritize and manage your workload as a manager? I prioritize my workload by setting clear goals and deadlines, and by identifying tasks that are urgent or important. I delegate tasks to team members when appropriate and ensure that each task aligns with the company's goals and priorities. I also regularly review and adjust my workload as needed.
Can you describe your leadership style and how it has helped you achieve success? My leadership style is collaborative, supportive, and empowering. I believe in building strong relationships with my team members and creating a positive and inclusive work culture. I encourage open communication and respect for one another's perspectives. This approach has helped me achieve success by fostering a productive and motivated team that is aligned with the company's goals.
How do you handle underperforming team members? When a team member is underperforming, I first assess the reasons for their poor performance and provide them with clear feedback and guidance on how to improve. I create a plan with them to address the issue and regularly check in on their progress. If the issue persists, I escalate the matter to senior management.
How do you stay up to date on industry trends and best practices? I stay up to date on industry trends and best practices by attending industry events, reading industry publications, and networking with other professionals in the field. I also encourage my team members to do the same and share their findings with the rest of the team.
Can you provide an example of a project you managed and how you achieved success? One project I managed involved developing a new product line for our company. I first identified the market opportunity and worked with my team to conduct market research and develop the product concept. We then worked collaboratively with other departments to design and produce the product, and successfully launched it to the market, exceeding our sales targets.
Can you describe a time when you had to work with a difficult stakeholder or client? I once had to work with a stakeholder who had conflicting priorities and expectations. To address the issue, I first listened to their concerns and made sure to clearly communicate our goals and limitations. I then worked collaboratively with them to find a solution that satisfied both parties and aligned with the company's goals.
How do you foster a positive and inclusive work culture within your team? I foster a positive and inclusive work culture by setting clear expectations and goals for my team, providing regular feedback and recognition, and encouraging open communication and respect for one another's perspectives. I also create opportunities for team building and personal growth, and ensure that diversity and inclusion are prioritized in all aspects of our work. I believe that a positive and inclusive work culture leads to a more engaged and motivated team that is better able to achieve its goals and contribute to the success of the company.

Hope you found this article very helpful.

If you want more personalized guidance on how to chart your journey to successful management roles, in a 1:1 with me, I'll teach you how to

answer questions effectively and impress the interviewer,
improve your style and content delivery to get your point across lucidly, and
supercharge your preparation for your dream management roles.

If you'd find that valuable, you can book time with me for a Mock Management Interview with personalized guidance and actionable insights. All the best!

Performance Comparison of Polars vs Pandas

Aditya Jyoti Paul — Wed, 22 Mar 2023 13:01:49 GMT

If you haven't read it already, do check out my article Introducing Polars, which goes over what is Polars, who it is for and the differences from Pandas in detail. This article only briefly touches upon them and focuses on the performance comparison,

🐻‍❄️ Introduction to Polars

Polars is an open-source data manipulation library that offers faster processing speeds and efficient memory usage compared to Pandas. It is built in Rust, a programming language known for its speed and safety, and offers a DataFrame API that is similar to Pandas.

When working with large datasets, choosing the right data manipulation library can significantly affect performance. In this blog, we will compare the performance of two popular data manipulation libraries: Pandas and Polars, using benchmarking.

🐼 Differences between Pandas and Polars

Pandas is a widely used data manipulation library that offers a comprehensive set of tools for data analysis. However, its performance is limited by its reliance on the Python programming language, which is known to be slower than languages like Rust. In contrast, Polars is built in Rust, which enables it to deliver significantly faster performance.

One of the key differences between Pandas and Polars is their memory usage. Pandas stores data in memory as a NumPy array, which can be memory-intensive for large datasets. On the other hand, Polars uses memory mapping, which enables it to read and write data to disk without loading it into memory. This results in more efficient memory usage, especially for large datasets.

Another significant difference between Pandas and Polars is their processing speed. Polars is designed to leverage the power of modern CPUs, making use of multi-threading and SIMD (Single Instruction Multiple Data) instructions. This enables it to perform operations like aggregations, joins, and filters much faster than Pandas, especially on large datasets.

🏄 Set Up

The setup is simple, we need to install pyarrow and polars. Without pyarrow, polars complains about it not being present. The experiments are run on an M1 Mac with 16GB ram and 256 GB SSD.

We will create a virtualenv for the experiment and run

pip install polars pandas pyarrow

To begin with, we create a data frame with 1 million rows and 10 columns, using NumPy's random function. We will use this data frame for our benchmarking.

# Create a datafram with 1 million rows and 10 columns
df = pd.DataFrame(np.random.rand(10000000, 10), columns=['col_'+str(i) for i in range(10)])
pl_df = pl.DataFrame(df)

🔔 Brief aside on %timeit

The %timeit magic command is used in Jupyter notebooks to measure the time taken by a piece of code to execute. The command runs the code multiple times to get an average execution time, which helps in eliminating any variations in the time taken by the code to execute due to system performance or other factors.

In the code snippets provided below, %timeit is used to compare the performance of Pandas and Polars libraries for various operations. The -r7 parameter, for example, indicates that the command should run the code 7 times, and -n 1000 specifies that the command should run the code 1000 times for each run. This ensures that the results obtained are statistically significant and reliable.

By using %timeit in this way, we can compare the performance of Pandas and Polars for various operations and determine which library is faster and more efficient for each command.

Now let's get into the operations we're going to benchmark.

📊 Benchmarking

📈 Selecting Columns

We start our benchmarking by selecting a column. We select columns 'col_0' and 'col_1' from our data frame using Pandas and Polars.

# Selecting columns
#Pandas
%timeit -r7 -n 1000 df[['col_0','col_1']]

#Polars
%timeit -r7 -n 1000 pl_df[['col_0','col_1']]
%timeit -r7 -n 1000 pl_df.select(pl.col(['col_0','col_1']))

The above code measures the execution time for each of the three operations (Pandas, Polars - using square brackets, and Polars - using select method) for 7 rounds with 1000 executions each.

Pandas took 34.6ms, whereas Polars took 1.48μs with indexing and 37.7μs for the select method.

The results show that Polars outperforms Pandas significantly for this operation, and highlight that square bracket indexing performed ~25 times faster than the select API for filter, which is generally the recommended mode. This exception is also called out in the polars docs, selecting with indexing section.

Polars select API (37.7μs) is ~1000 times faster than Pandas (34.6ms).
Polars square bracket indexing (1.48μs) is about 23,000 times faster than Pandas (34.6ms).

📈 Filtering Rows

Next, we filter rows based on a condition. We select rows where 'col_0' is greater than 0.5 using Pandas and Polars.

#Filtering rows 
#Pandas
%timeit -r7 -n 1000 df.query ('col_0>0.5')           
#Polars        
%timeit -r7 -n 1000 pl_df.filter(pl.col('col_0')>0.5)

The results show that Polars is faster than Pandas for this operation as well, but the magnitude is far lesser.

Pandas (108ms) takes about 2.8 times as long as Polars (38.3ms) in filtering rows.

📉 Groupby

In the next operation, we group by 'col_0' and calculate the mean of 'col_1'. We use both Pandas and Polars for this operation.

#Grouping by col_0 and calculating the mean of col_1
%timeit -r7 -n 1000 df.groupby('col_0')['col_1'].mean()              
%timeit -r7 -n 1000 df.groupby('col_0')['col_1'].agg('mean')         

#polars
%timeit -r7 -n 1000 pl_df.groupby('col_0').agg([pl.col('col_1').mean()]) #select method
%timeit -r7 -n 1000 pl_df.groupby('col_0').agg(pl.mean('col_1'))#short

The results show that Polars is slower than Pandas in doing groupby. Polars offers two syntaxes for this operation, and both are faster than Pandas.

In Pandas, .mean() was slightly slower than .agg('mean')

In Polars, using the pl.col function to extract the column and then calling .mean() was slightly faster than calling pl.mean directly within square indexing on the column.

Overall, Pandas (3.62s) was almost twice as fast as Polars(7s), irrespective of the syntax used.

📈 Conversion between wide and long formats

Lastly, we compare the performance of Pandas and Polars for conversion between wide and long formats using the melt function.

#Conversion between wide and long formats
#Pandas
%timeit -r7 -n 1000 pd.melt(df, id_vars=['col_0'], value_vars=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9'])
#Polars
%timeit -r7 -n 1000 pl_df.melt(id_vars=['col_0'], value_vars=['col_1', 'col_2', 'col_3', 'col_4', 'col_5', 'col_6', 'col_7', 'col_8', 'col_9'])

Thus Polars (696ms) is about than 5.5 times faster than Pandas (3.88s) for melt.

🎉 Wrap Up

In conclusion, Polars offers a compelling alternative to Pandas for data manipulation tasks, particularly when dealing with large datasets that require efficient memory usage and high processing speeds. While Pandas remains a popular and powerful library, Polars' performance advantages make it a valuable addition to any data scientist or data engineer's toolkit.

Here's an image with all the code and runtimes together.

I highly recommend reading the official Polars User Guide, if you're loving this series to learn more about the syntax, and query optimization. I'd love to hear on social media, which Polars feature or feature set you want me to cover in the next article.

If you liked this article, please show some love and do share it with your network who would find it valuable. See you in the next article.

Introducing Polars

Aditya Jyoti Paul — Mon, 20 Mar 2023 12:53:54 GMT

As data scientists, we are constantly searching for tools that will enable us to manipulate and analyze large datasets efficiently. Pandas is the most widely used tool in the data science community for data manipulation and analysis, and it has been for a long time. However, as datasets become larger and more complex, pandas can become a bottleneck in the data processing pipeline, especially when dealing with time-series data. In this blog post, we will explore a new data manipulation library, Polars, that aims to provide fast and memory-efficient data manipulation and analysis. We will discuss the key features of Polars and compare it with pandas to see how it stacks up in terms of performance, ease of use, and functionality.

What is Polars?

Polars is a new data manipulation library built for Rust, a system programming language that prioritizes safety, speed, and concurrency. Rust is known for its memory safety and speed, making it an excellent choice for building a high-performance data manipulation library like Polars. Polars provides a DataFrame API that is similar to pandas, but with some key differences that make it more efficient and faster for large datasets. Polars can handle large datasets, even if they don't fit into memory, by utilizing lazy evaluation and chunking. It also provides support for time-series data, which can be challenging to handle efficiently in pandas.

Key Features of Polars:

Lazy evaluation: Polars utilizes lazy evaluation to avoid unnecessary computations, which can save a lot of time and memory when working with large datasets. Lazy evaluation is a technique used to delay the computation of an expression until it is needed. This means that Polars will not compute any operation until it is necessary, which reduces the memory footprint of the operation.
Chunking: Polars can handle datasets that do not fit into memory by chunking the data into smaller pieces. This allows Polars to perform operations on each chunk separately, reducing the memory usage of the operation. The result is that Polars can handle larger datasets than pandas, even if they do not fit into memory.
Time-series support: Polars provides excellent support for time-series data, which can be challenging to handle efficiently in pandas. Polars provides time-series specific operations, such as rolling and resampling, that are optimized for speed and memory usage.
Parallelism: Polars is designed to take advantage of modern CPUs with multiple cores. It can use all available CPU cores to perform operations in parallel, making it faster than pandas in many cases.
Rust memory safety: Rust provides memory safety by preventing memory errors such as buffer overflows and null pointer dereferences. This makes Polars a safe and reliable library to use.

Comparing Polars and Pandas:

Performance: Polars is faster than pandas for many operations, especially when dealing with large datasets. Polars achieves this performance improvement by utilizing lazy evaluation, chunking, and parallelism. This means that Polars can handle larger datasets and perform operations faster than pandas in many cases.
Memory usage: Polars uses less memory than pandas, especially when dealing with large datasets. This is because Polars uses lazy evaluation and chunking to reduce the memory footprint of operations.
Ease of use: Polars has a similar API to pandas, so it is easy to use for those familiar with pandas. However, some operations may require a different approach in Polars, so there may be a learning curve for some users.
Functionality: Polars has a similar set of functionality as pandas, with some differences due to the design choices made by the developers. However, Polars provides excellent support for time-series data, which can be challenging to handle in pandas.

As a data scientist, you are constantly dealing with large datasets, and performing complex data manipulation tasks. In order to perform these tasks efficiently, you need to use the right tools. Two of the most popular tools for data manipulation are Pandas and Polars. In this blog, we will take a look at Polars and compare it to Pandas, and explore the benefits and drawbacks of using each tool.

Introduction to Polars

Polars is a Rust-based data manipulation library that aims to be a faster, safer and more ergonomic alternative to Pandas. It is designed to handle large datasets efficiently, with minimal memory usage, and provides an expressive API for data manipulation. The Polars library is built on top of Apache Arrow, which allows it to efficiently handle large datasets in a distributed environment.

Key Differences between Polars and Pandas

Both Polars and Pandas are powerful tools for data manipulation, but they have some key differences that may make one a better fit for your use case. Here are some of the key differences:

Performance

One of the key advantages of Polars over Pandas is its performance. Polars is designed to be faster and more memory-efficient than Pandas. This is because it is written in Rust, which is a faster language than Python. Additionally, Polars uses Apache Arrow to store data in a columnar format, which is more memory-efficient than the row-based format used by Pandas. In benchmarks, Polars has been shown to be up to 50 times faster than Pandas for certain operations.

Memory Usage

Polars uses a columnar format to store data, which can be more memory-efficient than the row-based format used by Pandas. This is because columnar storage allows for better compression and can reduce the amount of memory needed to store data. Additionally, Polars uses lazy evaluation, which means that it only computes operations when they are needed, which can further reduce memory usage.

API

Both Polars and Pandas have powerful APIs for data manipulation. However, the two libraries have some key differences in their APIs. Polars has a more concise API that is designed to be more ergonomic than Pandas. Additionally, Polars supports method chaining, which allows you to chain together multiple operations in a single expression. This can make code more readable and concise.

Language

Polars is written in Rust, which is a faster language than Python. Pandas is written in Python, which is a popular language for data science. While Rust may be faster than Python, it may be harder to learn than Python, especially if you are new to programming.

Compatibility

While Polars has a similar API to pandas, there are some differences in how operations are performed that may require a different approach. However, these differences are minor and should not be a significant barrier to entry for users familiar with pandas. But using proper Polars syntax is the best way to leverage it to it's fullest

In conclusion, Polars is an excellent alternative to pandas for data manipulation and analysis, especially when dealing with large datasets. Its focus on memory efficiency, speed, and support for time-series data make it a valuable addition to any data scientist's toolbox.

Here are some of the differences between Polars and Pandas against the key features:

Feature	Polars	Pandas
Language	Rust	Python
Memory usage	Uses less memory, especially when dealing with large datasets.	Can use a lot of memory, especially when dealing with large datasets.
Performance	Faster than pandas for many operations, especially when dealing with large datasets.	Slower than Polars for large datasets and some operations.
Lazy evaluation	Utilizes lazy evaluation to avoid unnecessary computations, reducing memory usage.	Does not utilize lazy evaluation, which can lead to unnecessary computations and increased memory usage.
Chunking	Can handle datasets that do not fit into memory by chunking the data into smaller pieces.	Cannot handle datasets that do not fit into memory without additional tools like Dask or Vaex.
Time-series support	Provides excellent support for time-series data, including time-series specific operations.	Supports time-series data, but lacks some of the time-series specific operations provided by Polars.
Parallelism	Designed to take advantage of modern CPUs with multiple cores.	Does not utilize parallelism by default.

Overall, Polars provides better memory efficiency and performance for large datasets, but requires some additional learning for users familiar with pandas. However, for users working with time-series data or dealing with memory constraints, Polars may be the better choice. If you're not working with over 1M datasets, from my experience, using one vs the other does not have significant differences. Polars does have much faster filtering and reshaping in general. To know more about the performance comparison, stay tuned for the next article.

Deploy your Hashnode blog on a subdirectory serverlessly

Aditya Jyoti Paul — Tue, 14 Mar 2023 13:44:28 GMT

🤠 Why read this really long blog :P

Are you a Hashnode blogger looking to host your blog on a subdomain? Are you interested in serverless computing and how it can help you build scalable and cost-effective solutions? Look no further than Cloudflare Workers! In this post, we'll walk through how to use Cloudflare Workers to host a Hashnode blog on a subdirectory, while taking advantage of the benefits of serverless computing.

If you haven't read about the advantages of subdirectory hosting for your blog yet, check out my previous post in this series.

☁️ What is Serverless Computing?

Serverless computing is a model in which a cloud provider dynamically manages the allocation and provisioning of servers. This means that developers do not need to worry about managing servers or infrastructure, and can instead focus on writing code. With serverless computing, you only pay for what you use, and you don't need to worry about scaling up or down based on user demand.

👷 What are Cloudflare Workers?

Cloudflare Workers are a serverless computing platform that allows developers to write and deploy code that runs on Cloudflare's edge network. This means that the code runs in data centers all around the world, closer to your users, resulting in faster response times and lower latency. Cloudflare Workers are a great option for building serverless applications, as they are highly scalable and cost-effective.

💪 Advantages of Serverless Computing

There are many advantages to using serverless computing, including:

Scalability: Serverless architectures can automatically scale up or down based on user demand, ensuring that your application can handle sudden spikes in traffic.
Cost-effectiveness: Serverless computing can help you save money, as you only pay for what you use. This means that you don't need to worry about paying for idle server time.
Reduced complexity: With serverless computing, you don't need to worry about managing servers or infrastructure, as this is all handled by the cloud provider. This means that you can focus on writing code instead of worrying about server management.
Faster time-to-market: Because you don't need to worry about infrastructure, serverless computing can help you get your applications to market faster.

⚡So why Cloudflare?

Reasons to use Cloudflare for hosting content from a subdomain to a subdirectory:

Cloudflare Workers provide a serverless platform that allows for running code on their network of servers, reducing the need for traditional hosting infrastructure.
Cloudflare's global network allows for fast delivery of content and can cache content at the edge, further improving performance.
Cloudflare Workers support async requests, which can make it easier to handle multiple requests simultaneously and improve performance.
Using a subdomain allows for separating content and improving security, and Cloudflare Workers make it easy to redirect requests from a subdomain to a subdirectory, giving you the best of both worlds.

Alternatives to using Cloudflare:

Traditional hosting solutions may require more maintenance and have higher costs than Cloudflare Workers.
Other serverless platforms, such as AWS Lambda, could provide similar functionality but may have a steeper learning curve and require more setup.
Using a reverse proxy server (with Apache or NGINX for example) could also provide similar functionality, but it may require more setup and maintenance than using Cloudflare Workers.

⚠️ This approach is not officially supported by Hashnode and hence can break at any point if the architecture/page structure changes.

🙋 I'm convinced, show me how to do it.

Awesome, now let's delve into the nitty-gritty of the implementation. Cloudflare workers are really easy to implement. if you're new to this, you can learn more about Workers here and try out the playground here. You can use the Wrangler CLI if you're already comfortable with Worker deployments, else I'd recommend the GUI-based method if you're a beginner, and that's the method I'd follow throughout the rest of this blog.

🕸️ Publish your blog on your custom subdomain

Publishing your blog on a custom subdomain with Hashnode is a pre-requisite to this, but if you haven't done that yet, please follow the detailed steps for the same here. I recommend just setting up the CNAME record to hashnode.network on Cloudflare DNS, and keeping the orange cloud on.

🔑 Verify your domain on Cloudflare

Click on Create Site and add your domain name. So if you're blog is currently at blog.example.com, enter example.com.
Select the Plan, Free works for most use cases, including mine, let's start off with that, thanks to Cloudflare for their generous limits.
Review your nameservers, you'd need to point your nameservers to Cloudflare, step-by-step instructions for which are available here. This can take up to 24 hrs, but usually takes a few minutes.
Ultimately you should see your domain name with a green tick and the word "Active" like this.

👷 Create your Worker

You can create a worker either from Workers > Overview > Create a service from the left pane on your primary dashboard

Click on your domain name (example.com) and then go to Workers Routes > Manage Workers> Create a Service.

And now you'd be here:

Give your worker a recognizable name, here I've named it d2s short for subdomain-to-subdirectory, select HTTP router and click Create Service.

Kudos on creating your first worker service. Now we need to add the code.

🪄 Define the data flow and redirect paths

Click on the Quick Edit button to access the Worker code.

The entire worker.js file is hosted on GitHub, and you can just copy the worker.js file and change the hostname from blog.example.com to the domain where you wish to host it.

🧭 Set the Routes

Browse over to the Workers tab on the left pane and click on Manage Service, and go to the Routes section. Add example.com/* to Route and your domain the Zone. You may disable the pre-defined routes if you wish, here' my setup as a reference, you should see your active active routes, like you see at the bottom of this image.

🎉 Woohoo!

That's it! Your blog is now successfully deployed to example.com/blog and also accessible via blog.example.com. You can verify a new DNS Worker record is created in your DNS settings as well pointing to the worker you just deployed.

Pat yourself on your back, coz you deserve it and here's a cute panda rejoicing with you!

🧙The Secret Sauce in worker.js

I wanted to save this part for the last so as not to bog you down with code, but if you're a brave adventurer and want to extend the capabilities of your blog, worry not, as I shall explain how the code works and all the interesting nuggets of how I made this work here.

Some helpful references before we begin:

These were the only two resources I could find which inspired my work.

🌎 Defining the global variables

First we'll define our variables like hostname, blog directory and asset pathnames like this.

// keep track of all our blog endpoints here
const myBlog = {
  hostname: "blog.example.com",
  targetSubdirectory: "/blog",
  assetsPathnames: ["/_next/", "/js/", '/api/', '_axiom', '/ping/'] 
}

📜 Explanations of the code segments

We create a function named handleRequest that handles the routing logic based on the request method along with 2 helper classes AttributeRewriter, ScriptAdder that do an on-the-fly code rewrite for the blog content hosted on the subdirectory. We also create a helper function gatherResponse to return the response body a s a string. We finally add an Event Listener that listens to fetch requests and responds with the result of handleRequest.

Let's understand them one by one.

1. GatherResponse awaits and returns the response body as a string. If the Content-Type is JSON, then it first converts it to a string else just returns the text of the response.

async function gatherResponse(response) {
  const contentType = response.headers.get('content-type') || '';
  if (contentType.includes('application/json')) {
    return JSON.stringify(await response.json());
  }
  return response.text();
}

2. AttributeRewriter class helps rewrite the relative URLs in the HTML content sent by Hashnode to the client so that the subdirectory-hosted site communicates as expected. Without it, requests would be sent to /awesome-blog (since blogs on Hashnode are all on the root directory of the subdomain) instead of /blog/awesome-blog. We want the requests to be sent to the targetSubdirectory /blog instead.

class AttributeRewriter {
  constructor(attributeName) {
    this.attributeName = attributeName;
  }
  element(element) {
    const attribute = element.getAttribute(this.attributeName);
    //add check for targetSubdirectory start for nested scenarios
    if (attribute && !attribute.startsWith('https://')) 
    {
      element.setAttribute(this.attributeName, myBlog.targetSubdirectory+attribute);
    }
  }
}

const htmlrewriter = new HTMLRewriter()
  .on('a', new AttributeRewriter('href'))

3. ScriptAdder adds a custom JS script to prepend targetSubdirectory ie. /blog in the browser URL and history for any blog posts that are opened without the targetSubdirectory present. Without this, relative URL refreshes and back and forward flows would break.

class ScriptAdder {
  element(element) {
      element.prepend('',{html: true});
  }
}

const scriptadder = new HTMLRewriter()
  .on('head', new ScriptAdder())

handleRequest checks if the content is a GET or POST request.
1. POST responses are first fetched from the subdomain-hosted site, and then sent over to the request originating from the subdirectory. Additionally, for /api/collect requests for pageviews, the payload content itself must be modified with the correct URL.
2. GET requests are handled differently based on whether they're for a blog document, or assets (CSS, JS, etc) and passed along if they're unrelated.

async function handleRequest(request) {
    const parsedUrl = new URL(request.url)
    const requestMatches = match => new RegExp(match).test(parsedUrl.pathname)

    // console.log(request.body)
    if (request.method === 'POST') {
      if (requestMatches("/api/collect")) {
        var post_body = await request.json()
        console.log(post_body)
        var req_url = post_body["payload"]["url"]
        req_url = req_url.split("/")[2];
        post_body["payload"]["url"] = '/'+req_url;
        post_body["payload"]["hostname"] = `${myBlog.hostname}`;

        const mod_req = {
        payload: post_body["payload"],
        type: "pageview",
        };

        console.log(mod_req)
        const response = await fetch(`https://${myBlog.hostname}/${parsedUrl.pathname}`, mod_req);
        const results = await gatherResponse(response);
        return new Response(results, request);
      }
      const response = await fetch(`https://${myBlog.hostname}/${parsedUrl.pathname}`, request);
      const results = await gatherResponse(response);
      return new Response(results, request);
    }

    // else method is GET

    // blog HTML
    if (requestMatches(myBlog.targetSubdirectory)) {
      console.log("this is a request for a blog document", parsedUrl.pathname);

      const pruned = parsedUrl.pathname.split("/").filter(part => part);
      if (parsedUrl.pathname.startsWith(myBlog.targetSubdirectory+'/newsletter')){
        return scriptadder.transform (htmlrewriter.transform(await(fetch(`https://${myBlog.hostname}/${pruned.slice(1).join("/")}`))));
      }
      if (pruned.length==1){
        return scriptadder.transform (htmlrewriter.transform(await(fetch(`https://${myBlog.hostname}`))));
      }
      else{
        return htmlrewriter.transform(await(fetch(`https://${myBlog.hostname}/${pruned.slice(1).join("/")}`)));
      }
    }

    // blog assets
    else if (myBlog.assetsPathnames.some(requestMatches)) {
      console.log("this is a request for other blog assets", parsedUrl.pathname)
      const assetUrl = request.url.replace(parsedUrl.hostname, myBlog.hostname);
      console.log(assetUrl)
      return fetch(assetUrl)
    }

    console.log("this is a request to my root domain", parsedUrl.host, parsedUrl.pathname);
    // unrelated stuff, do nothing
    return fetch(request)
  }

🙌 Conclusion

If you came this far, I congratulate you on trudging boldly to enlightenment, and welcome any contributions and suggestions for this project on my GitHub repo.

Thanks for reading, please show some love and share with your followers on social media. See you in the next article 👋

Subdomains vs Subdirectories: Which is right for your blog?

Aditya Jyoti Paul — Wed, 22 Feb 2023 14:21:58 GMT

Introduction

When starting a blog, one of the first things to consider is where to host it. Two popular options are to use a subdomain or a subdirectory. While both methods allow you to add a blog to your site, they differ in several key areas, including search engine optimization (SEO) and technical implementation. In this article, we'll compare the pros and cons of each approach to help you decide which one is best for your blog.

Subdomains

A subdomain is a domain that is part of a larger domain, but is technically a separate website. It typically appears as a prefix to the main domain, like blog.example.com. Here are some of the benefits and drawbacks of using a subdomain for your blog:

Pros

Separate identity: A subdomain can give your blog a separate identity from the main site, which may be useful if you want to differentiate your blog from the rest of your content.
Easy setup: Setting up a subdomain is relatively easy, and most web hosts allow you to create one with just a few clicks.
Scalability: If your blog grows in popularity and you need more resources to handle traffic, you can upgrade your subdomain to a dedicated server or cloud hosting. Internationalization: Subdomains can be a useful way to target international audiences. For example, if you have a blog in English and want to offer a version in Spanish, you could create a subdomain like es.example.com.

Cons

SEO challenges: While subdomains are a good way to organize content, search engines may view them as separate websites with less authority than the main site. This means it may be more difficult to rank well in search engine results pages (SERPs).
Backlink dilution: When you use a subdomain, any backlinks to your blog will not help the main site's SEO. This is because the subdomain is seen as a separate website with its own backlink profile.
Technical complexity: Setting up a subdomain requires more technical expertise than using a subdirectory. You may need to create a separate hosting account, configure DNS settings, and manage multiple databases.

Subdirectories

A subdirectory is a folder on your main domain that contains your blog content. It typically appears as a suffix to the main domain, like example.com/blog. Here are some of the benefits and drawbacks of using a subdirectory for your blog:

Pros

SEO benefits: Using a subdirectory can give your blog a boost in search engine rankings. This is because the subdirectory is seen as part of the main site, and any authority the main site has is passed on to the blog.
Easy setup: Most content management systems (CMS) allow you to set up a blog within a subdirectory with minimal technical knowledge.
Backlink authority: Any backlinks to your blog will also help the main site's SEO, as they are seen as part of the same domain.
Consistency: By using a subdirectory, you can keep your blog consistent with the rest of your site's branding and design.

Cons

Limited scalability: If your blog grows in popularity, you may need to upgrade your hosting plan to handle the traffic. This can be more difficult than upgrading a subdomain, as it may require a more expensive plan or dedicated server.
Less flexibility: If you want to create a separate identity for your blog, using a subdirectory may not be the best choice. It can be more difficult to differentiate your blog from the rest of your content with a subdirectory.

In conclusion, the decision to use subdomains or subdirectories for blogs depends on the specific needs and goals of the website. Subdomains may be easier to set up and can be useful for targeting specific markets or branding purposes, but they can also dilute keywords and backlinks, making it harder for the root domain to rank in search results.

On the other hand, subdirectories may require more setup and configuration, but they can help consolidate keyword and backlink authority, which can improve the overall SEO of the website. In addition, subdirectories are already built into basic websites, making them a more straightforward option for many website owners.

Based on the points discussed in this article, it is clear that subdirectories are a superior option for blogs, particularly for website owners who want to improve their SEO. While subdomains may be appealing for branding purposes, they can dilute keyword and backlink authority, which can ultimately hurt the SEO of the root domain.

Overall, the decision to use subdomains or subdirectories for blogs requires careful consideration of the specific needs and goals of the website, as well as an understanding of the potential trade-offs and benefits of each approach.

Some Industry References

A study by Moz in 2014 also found that using subdirectories was better for SEO than using subdomains. The study analyzed 90,000 root domains and found that subdirectories were more likely to rank higher in SERPs than subdomains. Check out a video explainer here.
Google’s own John Mueller has also stated that subdirectories are generally better than subdomains for SEO, when the content is related to the site. In a webmaster hangout in 2018, Mueller said that it’s easier for Google to understand the structure and hierarchy of a website that uses subdirectories, as opposed to one that uses subdomains.
Here's a great in-depth article from Semrush on this topic.

Conclusion

These studies and statements from industry experts suggest that using subdirectories can have SEO benefits compared to subdomains. However, it’s important to note that there are many factors that can impact a website’s SEO, and using subdirectories alone is not a guaranteed way to improve SEO.

Hacking the Food Sourcing Problem for young Indians

Aditya Jyoti Paul — Sun, 19 Feb 2023 15:53:09 GMT

What's this all about?

Jugaad and Dhanda are two words that are ingrained into every Indian's soul. But what is the one thing that we all love and the absence of which keeps us awake at night? Food.

This is article especially suited for students and bachelors but can be useful for everyone (who eats).

I'll keep updating the article as I find new tips and tricks and interesting insights. The article is broken down into sections so jump to the parts you're curious about.

Backstory and Motivation

A lot of my friends, colleagues and past batchmates are surprised how I survived for the last 5-6 years only by ordering online. Most of the first-time reactions are surprise, shock, and amusement and most of the common questions can be grouped into the following points

How I don't fall sick
How I save money
How do I maintain my schedule

To answer all these questions and to make life a bit easier for others to follow my footsteps, I'm penning my experiences and tons of tips and tricks in this blog.

What is this Food Sourcing Problem?

I'm a Computer Science graduate with a penchant for algorithms so of course, when one day I left for college and was running short on cash, I sat down to understand where all the time and money goes and how I can optimize the process to put food in my belly as simply and efficiently as possible

The food sourcing problem is based on reducing the effective spend on food, which is directly related to the cost of purchasing the food and the time spent on the same.

Below is an equation for the mathematically inclined

$$effectiveSpend = cost * timeValue$$

Now when ordering online, we massively mitigate the time component, and usually pay a slight premium compared to getting it at the restaurant. This is to compensate for the time and effort of the delivery agent, the food delivery platform and other middle stakeholders.

But sometimes online ordering it's also much cheaper, many restaurants on these aggregator platforms are actually cloud/ghost kitchens. In this business model, they don't have any seating area for customers and operate solely towards online customers often sharing cooking resources with multiple brands all under one roof.

So what does it all mean? Is ordering online cheaper? Let's break this down in the next section.

Analysis

Online Food Ordering is right for you if you're living alone or are students, bachelors, etc. Let's see from an example,

If you're living alone and live in Kolkata, your monthly spend on the food is going to include:

Cost of Raw Materials (Oil, salt, veggies, rice, atta etc): Rs 3000-3500
Time Spent: 60 hrs/month.

For 2 people, your monthly spend incurred is

Cost of Raw Materials (Oil, salt, veggies, rice, atta etc): Rs 4500-5000
Time Spent: ~70 hrs/month

Thus if you're cooking you'd certainly be better off if you invest in a refrigerator and share the food and cooking responsibilities with a roommate.

Now let's analyze online ordering, I'll use my numbers in this example

Living alone:

Cost of food : Rs 4500/month (after discounts, including all charges)
Time: ~5 hrs /month

For 2 people:

Cost of food : 5000-6000 /month (after discounts, including all charges)
Time: ~6 hrs /month

Below is a tabular illustration of this analysis, effective spend accounts for both the cost of materials and medium of cooking along with the monetary value of the time.

	Cost of Raw Materials and Gas	Time Spent	Effective Spend
Cooking for 1 person	Rs 3200	60 hrs/month	~ Rs 11,000
Cooking for 2 people	Rs 4700	70 hrs/month	~ RS 13800
Ordering for 1 person	Rs 4500	5 hrs/month	~ Rs 5150
Ordering for 2 people	Rs 5500	6 hrs/month	~ Rs 6300

Assumptions: Value of 1 hr time has been assumed to be Rs 130 for this illustration, typical hourly rates in India range from Rs 90 - 220.
I encourage you to fill this table with your numbers to see how much you'd save.

Now that we've shown we can save over 50% even at this hourly rate, I hope I've got your attention and you'd be amazed by the savings you'd get if your hourly rate goes over Rs 1500. At that point, your monthly food expenses would be worth ~4 hrs of work, so saving the hours will be pivotal and the effective savings will be ~90%.

The Solution

Of course, online deliveries are cheaper, just as ridesharing is cheaper, the unit economics speak for themselves and it's just time until more people accept it and include it in their lives. But being successful in this, and making significant savings, still does come with a learning curve, and I'm here to make this journey simpler.

So here are my super secret tips to be successful in this online ordering world

Act decisively: Always have a clear mind on what you want to order, browsing restaurants for hours will make you lose the very hrs you set out so hard to achieve, I spend no more than 10 mins/ day to order of which ~8 mins is spent on browsing restaurants, comparing discounts (sometimes across devices) and making the payment (usually instantaneous), rest is spent in guiding the delivery partner to my place.
Know the aggregator app like the back of your hand: I can recite the items I usually order, the restaurants I order from, the discount and offer codes, card benefits and when each discount starts and ends every day in my sleep. This becomes a habit like day-traders know the price levels of every major index they trade and other macro factors affecting them. Initially you might want to sit down once to learn about the various offers and features on the aggregator app you choose. I for example, use Swiggy.
Know the card offers: You can speed up your saving with cashback. Cards like SBI Cashback Card give 5% cashback on all online transactions including Swiggy, Zomato, Blikit, BigBasket etc (excluding rent, wallet reload etc). There are many awesome offers on these platforms from cards like Axis, OneCard, Slice, AU Bank and Union Bank of India as well.
Find out the best timings to order: The restaurant discounts are not static, ordering at peak hours is a sure-shot way of not getting a good deal. As you explore the platform keep looking at when the restaurant offers start and end, this will vary fr each restaurant. I for example find amazing breakfast deals around 8am-noon and dinner deals 10pm-2am, YMMV.
Try to increase Avg Order Value: It's possible to save ~25% more by ordering items with a combined order value over 400, this way if you already have a microwave and a fridge you can save quite a lot more. Another idea discussed above is to find a food partner.
Don't underestimate combining offers: I often save over 70% for each item, by :
1. Buying Swiggy vouchers from Woohoo/Gyftr: ~12%, the voucher is already discounted by ~8%, on top of which I get back 5% cashback.
2. Using the Swiggy voucher on restaurants offering ~60% off.
  In fact, many times, you'd be able to save even up to 30% compared to having the same item in the restaurant.
  
  You could also directly use a lot of card offers directly on Swiggy for example, Axis Select, My Zone has 40% off offers. if you're an Airtel customer, using the Airtel Axis card which gives you 10% cahback is a decent deal too but I find the cashback card+voucher strategy much better.
Use the subscriptions to the fullest: If you're using Swiggy or Zomato or similar aggregator service regularly, make sure you sign up for their Subscriptions. It's a steal deal at around 799 for a year, when I had been switched over from the Swiggy Super Binge plan.

Now the prices seem to have gone up drastically to around 2000 Rs. For me the subscription might still be sensible, but now I'd suggest you take the plan only if you're ordering regularly, and preferably ordering for your family, friends etc too.

The biggest reason I'd choose the subscription is for the free delivery.
To find if it's right for you, just ask yourself this:

"How many orders would you do in a year and how much on average would you like to pay for each delivery?"

If the product of these two is not lesser than the subscription cost by over 25%, get the subscription, as it'd also have additional benefits in terms of dining out and grocery orders etc.

In the image, you'd see I've saved over 11k this year, but in reality, the real value addition of Swiggy One would be closer to 7k, you'd have 'saved' ~3k anyway due to restaurant discounts even without the subscription, and many of the 'only for Swiggy One customers' offers are just rebranded regular customer offers.

Conclusion

Thus we discussed how we can save 50-90% per month by ordering online, and I also wanted to touch up on the other two questions I get ie. how I maintain the schedule and stay fit. Personally, I found online ordering to be even better than cooking for my schedule and as long as I'm not ordering super processed food everyday (like KFC), I've noticed no major detriment to health. To be honest I found externally ordered food much healthier than the super oily mess food in my canteen. But again YMMV.

Thanks to the amazing delivery agents making our lives easier every day.

💸 Support my work and earn Rs 500

One of the key strategies to save money by online ordering requires using the SBI Cashback Credit Card, which I'd like to recommend.

You can earn an additional Rs 500 by using my referral code j0Rbm8xCOnL, or by applying for any SBI card via this link sbicard.com/invite/j0Rbm8xCOnL. Once you are onboarded, I too would receive the same incentive which helps support this work.

I've been using this card for almost a year, and had recouped its annual fees of 999+gst in just 2 months of cashback, which is anyway waived off on spending 2 lacs, thus this card is working extremely well for me now.

Parting thoughts

Cheers to your first 500 Rs saved through this. 🥳 Wish you all the best in your journey of simplifying and optimizing food sourcing in your life.

Do you have any other tips and tricks that I missed? Let me know in the comments. ✌️

Paper Review: Understanding Deep Learning Requires Rethinking Generalization

Aditya Jyoti Paul — Mon, 20 Apr 2020 00:58:05 GMT

Introduction

The paper “Understanding Deep Learning Requires Rethinking Generalization” [1] caused quite a stir in the Deep Learning and Machine Learning research communities, and was one of the three papers awarded the Best Paper Award in ICLR 2017. This paper approaches the question:

"What is it that distinguishes neural networks that generalize well from those that don’t?"

It tries to mitigate some of the existing misconceptions in that regard, through a host of randomization tests and touches upon how generalization error may or may not be related to regularization and loss. It also throws light on some interesting points like the finite-sample expressivity of neural nets and how solving the gram matrix in SGD gives us the minimum l2 norm but is not predictive of minimal generalization error. Let’s look at these findings in a bit more detail.

Findings

The most intriguing and unique strategy adopted by the researchers is randomization testing, wherein they train neural networks with true labels, partially corrupted labels (some labels are random labels from uniform distribution), completely random labels from uniform distribution and shuffled pixels, firstly, all having the same shuffling pattern and then with random pixels shuffling, and even from Gaussian noise samplings from original dataset. It is obvious that any correlation, meaningful to humans, between the image and the label is gradually broken. Despite this, the authors find that even from complete noise the neural network correctly ‘shatters’ the training data, ie. fit to training data with 100% accuracy, with same network structure and hyperparameters!

It is thus proved deep neural networks can easily fit random labels. The effective capacity of neural nets is sufficient for memorizing the entire data set, given enough parameters and optimization on random labels remains easy ie. it increases by a reasonably small constant factor, refuting some of the findings from the paper titled “Deep Nets don’t learn via memorization” by Krueger et al from the same conference. It’s not a complete paradox though as the researchers from the two papers have found some similar results but drawn different conclusions, observing them in different lights.

For eg, both research groups observed that with increase in noise, networks take a bit longer to shatter training data. On one hand, Krueger et al. concluded that Neural nets did not memorize the data, and captured patterns only. On the contrary, Zhang et al concluded that since the time taken was more by just a small factor, they might really be memorizing the training data, additionally they note that gaussian noise converged faster than random labels, corroborating their hypothesis. Finally, they note that generalization error is certainly dependent on the network choice, as observed, Inception gave the least generalization error against AlexNet and a 512x MLP.

The paper discusses the effectiveness of some regularization techniques:.

Augmenting data shows better generalization performance than weight decay, biggest gains are achieved by changing the model architecture.
Early stopping could improve generalization, but that is not always true. Batch normalization improves generalization.

The authors write, "Explicit regularization may improve generalization but is neither necessary nor by itself sufficient.”

Universal Function Approximation Theorem states that a single hidden layer, containing a finite number of neurons, can approximate any continuous function on compact subsets of Euclidean space, but does not comment on the algorithmic learnability of those parameters, ie. how feasibly or quickly this neural net will actually learn these weights. This is extended to their theorem when they say, “Given a particular set of data or fixed population level n in d dimensions, there exists a two-layer neural network with ReLU activations and 2n+d weights that can represent any function.”

Conclusion

The main findings of the paper can be concluded as, “Both explicit and implicit regularizers could help to improve the generalization performance. However, it is unlikely that the regularizers are the fundamental reason for generalization.”

This paper has a lot of interesting results, though it couldn’t offer effective solutions to some problems. I find it to be as groundbreakingly exciting as say the AlexNet paper in 2012, or the Dropout paper in 2014 (references below), which enlightened us with profound insights into neural networks.