Programming Notes
nacho4d avatar

Programming Notes

@nacho4d

Stop Using AWS Root Credentials — Set Up IAM Identity Center Properly

Open a terminal and run this:

cat ~/.aws/credentials

If you see something like this:

[default]
aws_access_key_id = ...
aws_secret_access_key = .../...

You have a problem. Those are long-lived static credentials sitting in plain text on your laptop. This post explains why that is dangerous and walks you through fixing it properly using AWS IAM Identity Center.


1. Why This Matters — The Problem With What You Probably Have

The bad practice

When most people start with AWS they do something like this: create an access key, run aws configure, paste the key in, and get to work. It takes five minutes and it works. The problem is what you end up with:

  • A permanent key that never expires unless you manually rotate it
  • That key in plain text in ~/.aws/credentials
  • Possibly committed to a git repo, copied into CI/CD pipelines, or living in .env files across your projects
  • If those credentials belong to root — unlimited, unrestricted access to everything in your account

One accidental git push, one leaked dotfile, one compromised laptop — and someone can run up thousands of dollars in crypto mining charges before you notice. This happens constantly. Search for it and you will find no shortage of horror stories.

Why root credentials are especially dangerous

The AWS root user is not just a very powerful IAM user. It exists outside the IAM system entirely. No policy can restrict it. No permission boundary applies to it. If someone gets root credentials they can close your account, remove MFA, and delete all your resources — and no IAM policy you write can stop them.

AWS itself says it clearly in their IAM best practices:

Safeguard your root user credentials and don't use them for everyday tasks.

The fix

Replace static credentials with temporary credentials that expire automatically. The modern AWS way to do this is IAM Identity Center — you log in via browser, get credentials that last a few hours, and nothing sensitive ever sits on disk permanently.


2. Concepts and Vocabulary

Before touching the console, make sure these terms are clear. AWS documentation uses them constantly and mixing them up is where most confusion starts.

AWS Account

The top-level container. Everything you create in AWS — EC2 instances, S3 buckets, Lambda functions — lives inside an account. An account is identified by a 12-digit ID like 089123123123. You can have multiple accounts (one for dev, one for prod, etc.).

Root User

The identity created when you first sign up for an AWS account. Identified by the email address you used. It has unlimited power and cannot be restricted. Use it only for the handful of tasks AWS requires root for (like closing the account). Lock it away with MFA and never create access keys for it.

IAM User

A permanent identity inside the IAM service. Has a username, password, and optionally access keys. IAM users made sense in 2010. Today they are discouraged for human access because they have long-lived credentials that are easy to mishandle. AWS's own documentation now recommends against them for most cases. See: IAM Users.

IAM Role

A temporary identity that can be assumed by a principal (a person, a service, an application). When you assume a role, AWS gives you short-lived credentials for that session. Roles are the foundation of secure AWS access — everything modern in AWS uses them.

IAM Policy

A JSON document that says what actions are allowed or denied on which resources. Policies are attached to roles (or users). They are the answer to: what can this identity actually do?

AWS Organizations

A service that groups multiple AWS accounts under one management structure. Even if you only have one account today, enabling Organizations is the right first step — it is free, it unlocks IAM Identity Center, and it prepares you for multi-account setups later. See: AWS Organizations.

IAM Identity Center

The modern central place to manage human access to AWS accounts. Formerly called AWS SSO (Single Sign-On). It manages users, groups, and permission sets. It issues temporary credentials on login. It is free. See: IAM Identity Center docs.

Permission Set

An IAM Identity Center concept. Think of it as a named role template — for example, AdministratorAccess or ReadOnly. You assign a permission set to a user on a specific account. IAM Identity Center creates the actual IAM role behind the scenes.

SSO Session

A named configuration in your local ~/.aws/config that points to your IAM Identity Center portal. When you run aws sso login, it uses this session to know where to authenticate you. You can have multiple sessions for different AWS organisations (personal, client A, client B, etc.).


3. Services Used

All these services are free. Placed in order of setup.

Service Purpose in this setup
AWS Organizations Groups accounts together. Required for IAM Identity Center organization instance.
AWS Identity and Access Management (IAM) Classic identity service. Still used underneath — Identity Center creates IAM roles behind the scenes.
AWS IAM Identity Center Manages SSO users, permission sets, and account assignments. Issues temporary credentials.
AWS Security Token Service (STS) Issues short-lived temporary credentials when you log in. You never interact with it directly.

4. Setting Up IAM Identity Center — Step by Step

You need to be logged into the AWS console as root for the first few steps. After setup, root goes back in the drawer permanently.

Step 1 — Secure root first

Root should have MFA and no access keys. Check now:

  1. In the console, top-right corner → your account name → Security credentials
  2. Under Multi-factor authentication — confirm MFA is enabled. If not, add it now. A hardware security key (FIDO2/U2F) is best. An authenticator app is fine.
  3. Under Access keys — if any key exists, deactivate and delete it. Root should never have an access key.

Step 2 — Enable billing access for IAM identities

By default, even an admin SSO user cannot see your billing console. Fix this now while you are still in root:

  1. Top-right → your account name → Security credentials
  2. Scroll to IAM user and role access to Billing informationEdit
  3. Check Activate IAM AccessUpdate

Step 3 — Enable AWS Organizations

  1. Search for AWS Organizations in the console search bar
  2. Click Create an organization → confirm

Your account becomes the management account. This is free and does not change how your account works. You will also see a banner offering to centralise root access for member accounts — click Enable in IAM. This lets you manage root credentials of any future member accounts centrally, which is a security best practice.

Step 4 — Enable IAM Identity Center

Important: Change your AWS region before clicking Enable. IAM Identity Center is regional and cannot be moved later without deleting it and starting over. Pick the region nearest to you or your team.

  1. Change region in the top-right dropdown — e.g. ap-northeast-1 for Tokyo
  2. Search for IAM Identity Center
  3. Click EnableEnable with AWS Organizations

You will see a success message with a new instance ID. Keep this page open.

Step 5 — Create your SSO admin user

This is the identity you will use for all daily work from now on. Not root. Not an IAM user.

  1. IAM Identity Center left menu → UsersAdd user
  2. Fill in: username (e.g. yourname-admin), email address, first and last name
  3. Click through to Add user
  4. Check your email for an invitation from no-reply@login.awsapps.com
  5. Click the invite link, set a password, and set up MFA on this user too

Step 6 — Create a Permission Set

  1. Left menu → Permission setsCreate permission set
  2. Choose Predefined permission set
  3. Select AdministratorAccess
  4. Keep the default name → Create

For a personal account, AdministratorAccess is fine — you are the only user. For team or client work you would later create scoped-down sets like TerraformDeploy or ReadOnly.

Step 7 — Assign the user to your account

  1. Left menu → AWS accounts
  2. Check the checkbox next to your account
  3. Click Assign users or groups
  4. Select your SSO user → Next
  5. Select AdministratorAccessNextSubmit

AWS provisions a real IAM role behind the scenes with a trust policy allowing your SSO user to assume it. You never manage that role directly.

Step 8 — Note your portal URL

Left menu → Dashboard. You will see your AWS access portal URL:

https://d-xxxxxxxxxx.awsapps.com/start

Save this. It is your login page from now on — not console.aws.amazon.com. You can customise it under Settings → Identity source.

Step 9 — Clean up old credentials

Delete the old access key from AWS first — deleting the local file is not enough, the key still works in AWS until you revoke it there.

  1. Console → top-right → Security credentials (while logged in as whichever IAM user owned the key)
  2. Scroll to Access keys → deactivate → Delete

Then clean up locally:

# Wipe the credentials file
> ~/.aws/credentials

# Also check for other places like:
# ~/.zshrc ~/.bashrc ~/.bash_profile ~/.zprofile
# *.tf, *.env, *.tfvars

Step 10 — Configure the AWS CLI with SSO

aws configure sso

Answer the prompts:

SSO session name (Recommended): personal
SSO start URL [None]: https://d-xxxxxxxxxx.awsapps.com/start
SSO region [None]: ap-northeast-1
SSO registration scopes [sso:account:access]: [press Enter]

# A browser opens — log in as your SSO user and authorise the CLI

Default client Region [None]: ap-northeast-1
CLI default output format [None]: json
CLI profile name [AdministratorAccess]: admin

Your ~/.aws/config now looks like this. No credentials file. No keys anywhere.

[profile admin]
sso_session = personal
sso_account_id = 123456789012
sso_role_name = AdministratorAccess
region = ap-northeast-1
output = json

[sso-session personal]
sso_start_url = https://d-xxxxxxxxxx.awsapps.com/start
sso_region = ap-northeast-1
sso_registration_scopes = sso:account:access

5. Verify Everything Works

Log in and confirm your identity:

aws sso login --sso-session personal

aws sts get-caller-identity --profile admin

You should see output like this:

{
    "UserId": "AROAXXXXXXXXXXXXXXXXX:yourname-admin",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/AWSReservedSSO_AdministratorAccess_xxx/yourname-admin"
}

Three things to confirm in that output:

  • UserId ends with your SSO username — not root, not an IAM user
  • Arn contains assumed-role — you are using a temporary role, not permanent credentials
  • Account is your correct account ID

When the session expires (typically after 8 hours), just run aws sso login --sso-session personal again. That is your entire credential management workflow now.

5.1. Bonus: Try with Terraform

Use it with Terraform by setting the profile:
export AWS_PROFILE=admin
terraform init
terraform plan

Or in your provider.tf:

provider "aws" {
  region  = "ap-northeast-1"
  profile = "admin"
}

6. Before and After

Before After
Identity Root user or IAM user SSO user (yourname-admin)
Credentials type Static — never expire Temporary — expire in hours
Stored where Plain text in ~/.aws/credentials Nowhere — fetched fresh on each login
Rotation Manual — easy to forget Automatic — always
If credentials leak Permanent access until manually revoked Expires on its own within hours
Multi-account ready No — separate credentials per account Yes — one login, all accounts
Terraform workflow Keys in file, just works (dangerously) aws sso login then terraform

7. What's Next

This setup is the foundation. From here the natural next topics are:

  • Multiple AWS accounts — separate accounts for dev, staging, prod under the same Organization. Same SSO login, different permission sets per account.
  • Scoped permission sets — instead of AdministratorAccess everywhere, create a TerraformDeploy set with only what Terraform actually needs. Use IAM Access Analyzer to discover what was actually used.
  • CI/CD with zero credentials — for GitHub Actions, use OIDC federation. No static keys in your pipeline either.
  • External identity provider — connect IAM Identity Center to Okta, Google Workspace, or Azure AD. Users log in with company credentials, AWS access follows automatically.
  • Service Control Policies — organisation-level guardrails. Even admins cannot exceed them. The right tool for enforcing region restrictions, mandatory tagging, or blocking risky services.
Hope it helped. See you later 🐊

LangChain basics — tool calling from scratch

A practical summary of how LangChain works under the hood, built up step by step.

1. Initialising the model

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
init_chat_model wraps the underlying provider SDK. No network call happens here — it just configures a client object.

2. Defining tools with @tool

from langchain_core.tools import tool

tools = []

@tool
def extract_video_id(url: str) -> str:
    """Extracts the 11-character YouTube video ID from a URL."""
    ...

tools.append(extract_video_id)
@tool is a LangChain-specific decorator — not a Python built-in. The moment Python reads the @tool line, it:
  1. Reads __name__ → becomes the tool's name
  2. Reads __doc__ → becomes the description sent to the LLM
  3. Reads type annotations → builds a Pydantic input schema
  4. Wraps everything into a StructuredTool object
The original function is preserved inside .func and is never lost. After decoration, your variable (extract_video_id) immediately becomes a StructuredTool — it never exists as a plain function in your namespace. Calling a tool directly (outside of LLM flow):
extract_video_id.invoke({"url": "https://youtube.com/watch?v=abc"})
extract_video_id.func("https://youtube.com/watch?v=abc")  # raw function

3. Binding tools to the model

llm_with_tools = llm.bind_tools(tools)

This is pure preparation — no network call. It creates a new runnable that stores the tool definitions to include in every subsequent API call. When you later call .invoke(), LangChain serialises the tool definitions into the OpenAI JSON format:

{
  "model": "gpt-4o-mini",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "extract_video_id",
        "description": "...",
        "parameters": { "type": "object", "properties": { "url": { "type": "string" } } }
      }
    }
  ]
}

4. Message types

LangChain uses typed message classes. Each serialises differently into the JSON sent to the API. The direction is always consistent:

ClassDirectionWho creates it
SystemMessageclient → LLMdeveloper (instructions, persona)
HumanMessageclient → LLMend user
AIMessageLLM → clientthe model
ToolMessageclient → LLMyour tool execution code

Below are samples on how to create each message:

from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
  • SystemMessage sits permanently at index 0 — never recreated between turns.
    messages = [
        SystemMessage(content="You are a helpful assistant that does this and that...")
    ]
  • HumanMessage is appended fresh every user turn.
    user_input = input("You: ")
    messages.append(HumanMessage(content=user_input))
  • AIMessage is the return value of invoke() — append it as-is, never construct one by hand.
    response = llm_with_tools.invoke(messages)
    messages.append(response)          # response is already an AIMessage
  • ToolMessage requires a tool_call_id matching the id the LLM assigned to its request. That id lives in tool_call["id"], which is why execute_tool reads it from the tool call dict rather than generating one itself.
    for tool_call in response.tool_calls:
        messages.append(execute_tool(tool_call))   # execute_tool returns a ToolMessage

After a full exchange with one tool call, messages[] looks like this:

[
    SystemMessage(content="You are a helpful assistant ..."),   # permanent
    HumanMessage(content="Summarise this video: youtube.com/..."),
    AIMessage(tool_calls=[{"name": "extract_video_id", "args": {...}, "id": "tu_abc"}]),
    ToolMessage(content="T-D1OfcDW1M", tool_call_id="tu_abc"),
    AIMessage(content="Here is a summary ..."),
]

This list is the LLM's entire working memory — sent in full on every invoke() call.

5. The tool call loop

When you call llm_with_tools.invoke(messages), the LLM does not execute your tools. It returns an AIMessage that says "I want to call this tool with these arguments". Your code must execute the tool and send the result back.

sequenceDiagram
    participant P as Your program
    participant L as LLM API

    P->>L: `① POST /v1/messages — messages:[{role:"user",content:"summarize this video"}], tools:[extract_video_id,fetch_transcript]`

    L-->>P: `② 200 stop_reason:"tool_use" — content:[{type:"tool_use",id:"tu_abc",name:"extract_video_id",input:{url:"youtube.com/..."}}]`

    Note over P: `AIMessage(tool_calls) → invoke extract_video_id → returns "T-D1OfcDW1M"`

    P->>L: `③ POST /v1/messages — role:"assistant":{type:"tool_use",id:"tu_abc"}, role:"user":{type:"tool_result",tool_use_id:"tu_abc",content:"T-D1OfcDW1M"}`

    L-->>P: `④ 200 stop_reason:"tool_use" — content:[{type:"tool_use",id:"tu_xyz",name:"fetch_transcript",input:{video_id:"T-D1OfcDW1M"}}]`

    Note over P: `AIMessage(tool_calls) → invoke fetch_transcript → returns "[full transcript]"`

    P->>L: `⑤ POST /v1/messages — role:"user":{type:"tool_result",tool_use_id:"tu_xyz",content:"[full transcript]"}`

    L-->>P: `⑥ 200 stop_reason:"end_turn" — content:[{type:"text",text:"Here is a summary..."}]`

Key rules:

  • Always append the AIMessage to messages[] before appending the ToolMessage. The API requires them to appear in order.
  • When response.tool_calls is non-empty, the loop continues. When it is empty, the LLM is done. Note that the raw field name varies by provider — OpenAI calls it finish_reason, Anthropic calls it stop_reason. LangChain abstracts this behind response.tool_calls.
  • AIMessage.content is typically empty ('') while the LLM is in tool-calling mode — in most providers it has nothing to say to the user until all tools have been executed.

6. The tool mapping

tool_mapping = {t.name: t for t in tools}

Built from the tools list automatically — no need to maintain it by hand. Used to look up the callable from the string name the LLM returns:

tool_fn = tool_mapping[tool_call["name"]]
result  = tool_fn.invoke(tool_call["args"])

7. A reusable execute_tool helper

def execute_tool(tool_call):
    try:
        result = tool_mapping[tool_call["name"]].invoke(tool_call["args"])
        return ToolMessage(content=str(result), tool_call_id=tool_call["id"])
    except Exception as e:
        return ToolMessage(content=f"Error: {str(e)}", tool_call_id=tool_call["id"])

The try/except matters: if the tool throws, you still need to return a ToolMessage with the matching tool_call_id, otherwise messages[] becomes malformed and the next API call fails. Returning an error message lets the LLM see what went wrong and potentially recover.

8. A minimal agent loop

Putting it all together:

def run_agent(messages):  # note: mutates messages in place
    while True:
        response = llm_with_tools.invoke(messages)
        messages.append(response)

        if not response.tool_calls:   # LLM is done
            return response

        for tool_call in response.tool_calls:
            messages.append(execute_tool(tool_call))

The LLM can request multiple tools in a single response — hence the for loop over response.tool_calls. All results are appended before the next invoke().

This is essentially what LangGraph's ToolNode does internally — the tutorial is showing you the manual version so you understand what the framework automates.

9. Chaining

LangChain's chaining feature allows composing multiple steps into a single pipeline using the | operator — the same concept as Unix pipes:

my_chain = p1 | p2 | p3

Each step must return a Runnable, which allows calling my_chain.invoke({...}) on the whole pipeline. Data enters via invoke() and flows through each step in order — the chain itself is just a description of transformations, no data flows through it at definition time.

Two building blocks make this work:

  • RunnablePassthrough.assign(key=fn) — copies the current dict and adds one new key, where the value is the result of calling fn with the current dict. All previous keys are preserved.
  • RunnableLambda(fn) — wraps a plain function into a Runnable so it can participate in the | pipeline. Unlike assign, it replaces the entire value rather than adding a key — used as the final step to extract a single return value.

Here is a hardcoded chain implementation for our agent. Each comment shows the state of the dict x after that step:

summarization_chain = (

    # x = {"query": "...url..."}
    RunnablePassthrough.assign(
        messages=lambda x: [HumanMessage(content=x["query"])]
    )
    # x = {"query": "...", "messages": [HumanMessage]}

    | RunnablePassthrough.assign(
        ai_response=lambda x: llm_with_tools.invoke(x["messages"])
    )
    # x = {..., "ai_response": AIMessage(tool_calls=[extract_video_id])}

    | RunnablePassthrough.assign(
        tool_messages=lambda x: [
            execute_tool(tc) for tc in x["ai_response"].tool_calls
        ]
    )
    # x = {..., "tool_messages": [ToolMessage(content="T-D1OfcDW1M")]}

    | RunnablePassthrough.assign(
        messages=lambda x: x["messages"] + [x["ai_response"]] + x["tool_messages"]
    )
    # x = {..., "messages": [HumanMessage, AIMessage, ToolMessage]}
    # messages is now a proper conversation history for the next LLM call

    | RunnablePassthrough.assign(
        ai_response2=lambda x: llm_with_tools.invoke(x["messages"])
    )
    # x = {..., "ai_response2": AIMessage(tool_calls=[fetch_transcript])}

    | RunnablePassthrough.assign(
        tool_messages2=lambda x: [
            execute_tool(tc) for tc in x["ai_response2"].tool_calls
        ]
    )
    # x = {..., "tool_messages2": [ToolMessage(content="[full transcript]")]}

    | RunnablePassthrough.assign(
        messages=lambda x: x["messages"] + [x["ai_response2"]] + x["tool_messages2"]
    )
    # x = {..., "messages": [HumanMessage, AIMessage, ToolMessage, AIMessage, ToolMessage]}

    | RunnablePassthrough.assign(
        summary=lambda x: llm_with_tools.invoke(x["messages"]).content
    )
    # x = {..., "summary": "The video discusses LangChain..."}

    | RunnableLambda(lambda x: x["summary"])
    # returns just the string — chain is done
)

Note that this chain is hardcoded for exactly two tool calls in a fixed order. It is useful as a learning device to make each round-trip explicit, but it is not suitable for production. The plain function equivalent is:

def summarize_video(query: str) -> str:
    # Build initial messages
    messages = [HumanMessage(content=query)]

    # First LLM call → wants to call extract_video_id
    ai_response = llm_with_tools.invoke(messages)
    messages.append(ai_response)
    for tc in ai_response.tool_calls:
        messages.append(execute_tool(tc))

    # Second LLM call → wants to call fetch_transcript
    ai_response2 = llm_with_tools.invoke(messages)
    messages.append(ai_response2)
    for tc in ai_response2.tool_calls:
        messages.append(execute_tool(tc))

    # Final LLM call → generates summary
    return llm_with_tools.invoke(messages).content

Both are functionally identical. The chain style becomes compelling in production for three reasons:

  1. Observability — the pipe style hooks automatically into tracing and monitoring tools such as LangSmith, without any manual instrumentation.
  2. Composability — chains can be passed around, combined, and reused as building blocks across different parts of an application.
  3. Parallelism — LangChain can run independent steps concurrently within a chain, without manual thread management.

10. Quick reference

ConceptWhat it does
@tool decoratorconverts a function into a StructuredTool at decoration time
bind_tools(tools)stores tool definitions locally, no API call
invoke(messages)sends full message history + tool schemas to the API
AIMessage.tool_callslist of tool calls the LLM wants executed; empty when the LLM is done
ToolMessagecarries a tool result back to the LLM; must include matching tool_call_id
messages[]the LLM's entire working memory, sent in full on every invoke() call
RunnablePassthrough.assigncopies the current dict and adds one new key; used to build pipelines step by step
RunnableLambdawraps a plain function into a Runnable so it can join a | pipeline

pnpm - Performant node package manager

What is pnpm?

Performant npm (command name: pnpm) is a fast, disk-efficient package manager for Node.js that serves as an alternative to npm. Unlike npm, which comes bundled with Node.js, pnpm is a separate tool you install yourself.

Key differences with npm:

  • Stores packages in a global cache and links them to projects via hard links.
    Each package version is downloaded once and reused across all projects
  • Uses a stricter node_modules structure with symlinks, preventing packages from accessing undeclared dependencies

Benefits:

  • For development: Dramatically faster installations and significant disk space savings specially good for developers when working on multiple projects
  • For production: Since modern applications are containerized, improvements are modest - slightly faster builds and marginally smaller container images due to deduplicated sub-dependencies. It introduces no overhead while providing small benefits, making it a solid choice for production environments

Install

# Using npm
npm install -g pnpm
# Using Homebrew
brew install pnpm
# Using curl (standalone)
curl -fsSL https://get.pnpm.io/install.sh | sh -

Configure

pnpm needs the $PNPM_HOME environment variable to be properly configured. It also needs to be added to your $PATH. The command below will set this up and modify ~/.zshrc so future terminal sessions work correctly.

pnpm setup
Now open a new terminal window and continue there OR apply the changes manually in the current session: source ~/.zshrc

Verify Installation

Check pnpm is installed

pnpm --version
# As of December 2025, should be something like: 10.x.y

Check where pnpm is installed

which pnpm

Verify pnpm store location

pnpm store path
# Should show something like: /Users/yourusername/Library/pnpm/store/v10

Basic Usage

New project

Create package.json. Similar to npm init
pnpm init

Install/Add a package

pnpm add react          # Adds to dependencies in package.json
pnpm add -D typescript  # Adds to devDependencies in package.json
pnpm add -g nodemon     # Global install, not added to any package.json

Existing projects

Install all dependencies from package.json. Equivalent to npm install

pnpm install
# or just
pnpm i

Run scripts

Execute a "dev" script defined in "scripts" section in package.json. Equivalent to npm run dev

pnpm run dev
# or omit 'run'
pnpm dev

Other common commands

pnpm remove <package>   # Uninstall and remove from package.json
pnpm update             # Update dependencies
pnpm list               # List installed packages
pnpm outdated           # Check for outdated packages
That is all.
See you later 🐊!

Running Granite Models with Ollama

If you want to run Granite models for inference, Ollama is probably the easiest approach. Here's how to get started and troubleshoot common issues.

Installation

Install Ollama using Homebrew:

brew install ollama

Choosing a Model

Visit the Ollama Granite4 library to explore available models.

Models with the -h suffix use the newer hybrid architecture, which offers several advantages over the condensed architecture:

  • Smaller model size
  • Larger context windows (up to 1M tokens)
  • More efficient inference by activating only a portion of parameters
Name Size Context Input
granite4:latest 2.1GB 128K Text
granite4:micro 2.1GB 128K Text
granite4:micro-h 1.9GB 1M Text
granite4:tiny-h 4.2GB 1M Text
granite4:small-h 19GB 1M Text

Pulling and Running a Model

Pull your chosen model:

ollama pull granite4:small-h

Test that it works:

ollama run granite4:small-h
>>> Hello, what company is behind your creators?
The company that created me is called IBM (International Business Machines Corporation). ...
>>>

Troubleshooting: Error 500

Models with the hybrid architecture require a recent version of Ollama. If you encounter a 500 Internal Server Error when trying to run a model, you likely need to update Ollama and then try to run the model again.

Error:

ollama run granite4:tiny-h
Error: 500 Internal Server Error: unable to load model: /Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495

ollama --version
ollama version is 0.11.4

Debugging the Error

To understand the root cause, restart the Ollama server with debug logging enabled:

brew services stop ollama
OLLAMA_DEBUG=2 ollama serve
ollama serve full log
OLLAMA_DEBUG=2 ollama serve
time=2025-10-26T18:40:19.270+09:00 level=INFO source=routes.go:1304 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:DEBUG-4 OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/ignacio/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2025-10-26T18:40:19.271+09:00 level=INFO source=images.go:477 msg="total blobs: 23"
time=2025-10-26T18:40:19.272+09:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-10-26T18:40:19.272+09:00 level=INFO source=routes.go:1357 msg="Listening on 127.0.0.1:11434 (version 0.11.4)"
time=2025-10-26T18:40:19.273+09:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler"
time=2025-10-26T18:40:19.313+09:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=metal variant="" compute="" driver=0.0 name="" total="107.5 GiB" available="107.5 GiB"
[GIN] 2025/10/26 - 18:40:34 | 200 |      30.458µs |       127.0.0.1 | HEAD     "/"
time=2025-10-26T18:40:34.677+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
[GIN] 2025/10/26 - 18:40:34 | 200 |   20.910833ms |       127.0.0.1 | POST     "/api/show"
time=2025-10-26T18:40:34.692+09:00 level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-10-26T18:40:34.697+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=general.alignment default=32
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=sched.go:226 msg="loading first model" model=/Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=memory.go:111 msg=evaluating library=metal gpu_count=1 available="[107.5 GiB]"
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.vision.block_count default=0
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default=0
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.key_length default=128
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.value_length default=128
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default=0
time=2025-10-26T18:40:34.707+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-10-26T18:40:34.707+09:00 level=INFO source=sched.go:786 msg="new model will fit in available VRAM in single GPU, loading" model=/Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495 gpu=0 parallel=1 available=115448725504 required="5.5 GiB"
time=2025-10-26T18:40:34.708+09:00 level=INFO source=server.go:135 msg="system memory" total="128.0 GiB" free="98.2 GiB" free_swap="0 B"
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=memory.go:111 msg=evaluating library=metal gpu_count=1 available="[107.5 GiB]"
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.vision.block_count default=0
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default=0
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.key_length default=128
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.value_length default=128
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default=0
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=ggml.go:208 msg="key with type not found" key=granitehybrid.attention.head_count_kv default="&{size:0 values:[]}"
time=2025-10-26T18:40:34.708+09:00 level=INFO source=server.go:175 msg=offload library=metal layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[107.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.5 GiB" memory.required.partial="5.5 GiB" memory.required.kv="320.0 MiB" memory.required.allocations="[5.5 GiB]" memory.weights.total="3.9 GiB" memory.weights.repeating="3.8 GiB" memory.weights.nonrepeating="120.6 MiB" memory.graph.full="640.0 MiB" memory.graph.partial="640.0 MiB"
time=2025-10-26T18:40:34.708+09:00 level=DEBUG source=server.go:291 msg="compatible gpu libraries" compatible=[]
llama_model_load_from_file_impl: using device Metal (Apple M4 Max) - 110100 MiB free
llama_model_loader: loaded meta data with 42 key-value pairs and 666 tensors from /Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granitehybrid
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 4.0 H Tiny
llama_model_loader: - kv   3:                         general.size_label str              = 64x994M
llama_model_loader: - kv   4:                            general.license str              = apache-2.0
llama_model_loader: - kv   5:                               general.tags arr[str,2]       = ["language", "granite-4.0"]
llama_model_loader: - kv   6:                  granitehybrid.block_count u32              = 40
llama_model_loader: - kv   7:               granitehybrid.context_length u32              = 1048576
llama_model_loader: - kv   8:             granitehybrid.embedding_length u32              = 1536
llama_model_loader: - kv   9:          granitehybrid.feed_forward_length u32              = 512
llama_model_loader: - kv  10:         granitehybrid.attention.head_count u32              = 12
llama_model_loader: - kv  11:      granitehybrid.attention.head_count_kv arr[i32,40]      = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, ...
llama_model_loader: - kv  12:               granitehybrid.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13: granitehybrid.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                 granitehybrid.expert_count u32              = 64
llama_model_loader: - kv  15:            granitehybrid.expert_used_count u32              = 6
llama_model_loader: - kv  16:                   granitehybrid.vocab_size u32              = 100352
llama_model_loader: - kv  17:         granitehybrid.rope.dimension_count u32              = 128
llama_model_loader: - kv  18:              granitehybrid.attention.scale f32              = 0.007812
llama_model_loader: - kv  19:              granitehybrid.embedding_scale f32              = 12.000000
llama_model_loader: - kv  20:               granitehybrid.residual_scale f32              = 0.220000
llama_model_loader: - kv  21:                  granitehybrid.logit_scale f32              = 6.000000
llama_model_loader: - kv  22: granitehybrid.expert_shared_feed_forward_length u32              = 1024
llama_model_loader: - kv  23:              granitehybrid.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  24:               granitehybrid.ssm.state_size u32              = 128
llama_model_loader: - kv  25:              granitehybrid.ssm.group_count u32              = 1
llama_model_loader: - kv  26:               granitehybrid.ssm.inner_size u32              = 3072
llama_model_loader: - kv  27:           granitehybrid.ssm.time_step_rank u32              = 48
llama_model_loader: - kv  28:       granitehybrid.rope.scaling.finetuned bool             = false
llama_model_loader: - kv  29:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  30:                         tokenizer.ggml.pre str              = dbrx
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,100352]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  32:                  tokenizer.ggml.token_type arr[i32,100352]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  33:                      tokenizer.ggml.merges arr[str,100000]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  34:                tokenizer.ggml.bos_token_id u32              = 100257
llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 100257
llama_model_loader: - kv  36:            tokenizer.ggml.unknown_token_id u32              = 100269
llama_model_loader: - kv  37:            tokenizer.ggml.padding_token_id u32              = 100256
llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- set tools_system_message_prefix =...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  337 tensors
llama_model_loader: - type q4_K:  286 tensors
llama_model_loader: - type q6_K:   43 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 3.94 GiB (4.87 BPW) 
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'granitehybrid'
llama_model_load_from_file_impl: failed to load model
time=2025-10-26T18:40:34.741+09:00 level=INFO source=sched.go:453 msg="NewLlamaServer failed" model=/Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495 error="unable to load model: /Users/ignacio/.ollama/models/blobs/sha256-9811e90b0eecf2b194aafad5bb386279f338a45412a9e6f86b718cca6626c495"
[GIN] 2025/10/26 - 18:40:34 | 500 |   60.900167ms |       127.0.0.1 | POST     "/api/generate"

The key error message reveals the issue:

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'granitehybrid'

This confirms that the current version of Ollama doesn't recognize the hybrid architecture, so an update is required.

Update Ollama using Homebrew:

brew upgrade ollama

Verify the update and test again:

ollama --version
ollama version is 0.12.6

ollama run granite4:tiny-h
>>> Hello, what are you doing today?
Hello! I'm here to assist you with any questions or tasks you may have...
>>>

That's it! You should now be able to run Granite models with Ollama without issues.

UV - Python Package Manager

What is UV?

UV is a next-generation Python toolchain written in Rust. It unifies package management, virtual environments, and Python version control into a single, high-performance binary — no activation steps, no Python preinstalled.

Why UV Exists

Python development traditionally requires multiple tools — pip, venv, and pyenv — leading to slow installs, multiple CLIs, and the “bootstrap problem” (needing Python installed first). Even Poetry, which improved dependency resolution, still relies on external Python managers and mutates the shell via activated environments, causing inconsistent command behavior.

UV rethinks the workflow. It replaces multiple tools with one binary, supports the official PEP-518 pyproject.toml standard, and runs code with uv run — activating virtual environments internally per process without affecting your terminal. It also includes a Python version manager, SAT-based dependency resolution, and lockfile reproducibility.

The result: a tool that is 10–20× faster than pip or Poetry, works the same across macOS, Linux, and Windows, and removes most friction from Python development workflows.

Feature Comparison

Category Aspect pyenv + pip + venv Poetry + pyenv UV
Dependency Management Dependency resolution algorithm Sequential / greedy (may install conflicts) ✅ SAT solver (correct, reproducible) ✅ SAT solver (optimized, parallel)
Reproducibility with lock files requirements.txt: none
pip freeze: partial, no hashes
✅ Yes (poetry.lock) ✅ Yes (uv.lock)
Performance (install ~50 packages) 1× baseline (30–45s) ~1.3× faster (25–35s) ✅ 10–15× faster (2–3s, optimized in Rust)
Environment Management Virtual environment activation Manual activation required; mutates shell poetry run auto-activates uv run auto-activates internally (no shell mutation)
Tool integration 3 separate tools 2 tools (still needs pyenv) ✅ Single tool
Python Version Management Install / switch Python versions Via pyenv (mutates shell environment) Via pyenv (mutates shell environment) ✅ Built-in (no shell mutation)
Bootstrap requirement Needs Python preinstalled Needs Python preinstalled ✅ None (self-contained binary)

Typical Workflows Compared

Note: Poetry + pyenv covers most of the same workflow steps as UV, but UV is significantly faster, isolates environments without shell mutation, and includes a built-in Python version manager.

pyenv + pip + venv Poetry + pyenv UV
pyenv install 3.12
pyenv local 3.12
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
(wait 30–60s)
pyenv install 3.12
pyenv local 3.12
poetry install
(wait 20–40s)
uv python install 3.12
uv sync
(wait 2–3s)
python script.py poetry run python script.py uv run python script.py
deactivate (no deactivate needed) (no deactivate needed)

Using UV

There are several post about uv around. Below are just some simple commands I found I will be using every day.

Install UV

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Verify
uv --version

Start a Project

# Create and enter project
mkdir my-project && cd my-project

# Initialize pyproject.toml and environment
uv init

# Or if pyproject.toml already exists:
uv sync

Minimal pyproject.toml example:

[project]
name = "my-project"
version = "0.1.0"
requires-python = ">=3.11"

dependencies = [
    "requests>=2.31.0",
]

Managing Dependencies

Add / remove packages updates pyproject.toml automatically:

[project]
dependencies = [
    "requests>=2.31.0",
]

[tool.uv.dev-dependencies]
pytest = "*"
# Add main dependency
uv add requests

# Add dev dependency
uv add --dev pytest

# Remove dependency
uv remove requests

Dependency Groups (Optional)

Defined in pyproject.toml:

[dependency-groups]
dev = ["pytest", "mypy", "ruff"]
docs = ["mkdocs", "mkdocs-material"]
# Install groups
uv sync --group dev
uv sync --group docs

Python Versions

Stored in pyproject.toml:

[tool.uv.python]
3.12 = "*"
3.11 = "*"
# Install / switch Python
uv python install 3.12
uv python install 3.11

# List installed versions
uv python list

Running Scripts & Tools

Commands are defined or come from installed packages; uv run reads from pyproject.toml or the environment:

[project.scripts]
my-tool = "my_package.main:cli"
# Run Python script or module
uv run python main.py
uv run python -m module_name

# Run installed tools
uv run pytest
uv run mypy src/
uv run ruff check src/
uv run ruff format src/

# Run a custom script
uv run my-tool
Hope it helps.

Camera rotation angles in AVFoundation

I recently worked with AVFoundation camera and used AVCaptureDevice.RotationCoordinator to sync device rotation to video preview layer. That was straight forward but the problem was ouput sampleBuffer was not rotated.

So, for frame processing purposes I rotated the sampleBuffer with below function.
private func cgImageOrientation(from rotationAngle: CGFloat) -> CGImagePropertyOrientation {
    switch rotationAngle {
    case 0: return isFrontCamera ? .upMirrored : .up
    case 90: return isFrontCamera ? .rightMirrored : .right
    case 180: return isFrontCamera ? .downMirrored : .down
    case 270: return isFrontCamera ? .leftMirrored : .left
    default: return isFrontCamera ? .upMirrored : .up
    }
}
This are just notes for angles I observed and so far got successfull results. (Probably it is device specific so I would like to find a more general way of dealing with this ...)
Model Device orientation Camera position Preview Layer
Connection Angle
Camera func cgImageOrientation result ok?
iPad (A16) landscape top 180 front downMirrored ok
portrait right 90 front rightMirrored ok
landscape bottom 0 front upMirrored ok
portrait left 270 front leftMirrored ok

Service Logs in Azure Web App via Terraform

I spend lot of time to finally setup logs of my web application in Azure correctly.

Mission

After creating my web app. I want to see the logs in "Log Analytics Workspace" > "Logs".
In order to so so, I need first to tell the "App Service Logs" to see my STDOUT and STDERR. This is quite easy in the UI; It is basically just turn the "Application Logging" toggle to "File System".
I want to this but not via UI but via Terraform scripts.
application logs

My Problem

Linux WebApp Documentation explains about what logs, application_logs and http_logs do but it does not explicitly say the relationship between them.
It turns out that application_logs needs http_logs to work properly. I was passing ONLY application_logs and it did not have any effect on the toggle in the UI

Solution

Pass both application_logs and http_logs, even if not interested in the http_logs it is required for application logs to work. Likely to be an implementation detail not well documented.
resource "azurerm_linux_web_app" "main" {
  name                = "${var.app_name}-web"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  service_plan_id     = azurerm_service_plan.main.id

  site_config {
    ...
  }

  app_settings = {
    ...
  }

  identity {
    ...
  }

  logs {
    application_logs {
      file_system_level = "Information"  # Options: Off, Error, Warning, Information, Verbose
    }

    http_logs {
      file_system {
        retention_in_days = 7
        retention_in_mb   = 35
      }
    }

    detailed_error_messages = true
    failed_request_tracing = true
  }
}

This work is licensed under BSD Zero Clause License | nacho4d ®