Gemma 4 Device Calling Defined: Step-by-Step Information

Think about asking your AI mannequin, “What’s the climate in Tokyo proper now?” and as a substitute of hallucinating a solution, it calls your precise Python operate, fetches stay knowledge, and responds accurately. That’s how empowering the software name capabilities within the Gemma 4 from Google are. A very thrilling addition to open-weight AI: this operate calling is structured, dependable, and constructed immediately into the AI mannequin!

Coupled with Ollama for native referencing, it permits you to develop non-cloud-dependent AI brokers. The most effective half – these brokers have entry to real-world APIs and providers regionally, with none subscription. On this information, we are going to cowl the idea and implementation structure in addition to three duties you could experiment with instantly.

Additionally learn: Working Claude Code for Free with Gemma 4 and Ollama

Conversational language fashions have a restricted data primarily based on after they had been developed. Therefore, they will provide solely an approximate reply while you ask for present market costs or present climate situations. This lack was addressed by offering an API wrapper round frequent fashions (capabilities). The intention – to unravel some of these questions by way of (tool-calling) service(s).

By enabling tool-calling, the mannequin can acknowledge:

When it’s essential to retrieve exterior info
Determine the proper operate primarily based on the offered API
Compile accurately formatted technique calls (with arguments)

It then waits till the execution of that code block returns the output. It then composes an assessed reply primarily based on the obtained output.

To make clear: the mannequin by no means executes the tactic calls which were created by the consumer. It solely determines which strategies to name and how one can construction the tactic name argument checklist. The consumer’s code will execute the strategies that they known as by way of the API operate. On this situation, the mannequin represents the mind of a human, whereas the capabilities being known as symbolize the palms.

Earlier than you start writing code, it’s helpful to grasp how all the pieces works. Right here is the loop that every software in Gemma 4 will observe, because it makes software calls:

Outline capabilities in Python to carry out precise duties (i.e., retrieve climate knowledge from an exterior supply, question a database, convert cash from one forex to a different).
Create a JSON schema for every of the capabilities you’ve got created. The schema ought to comprise the title of the operate and what its parameters are (together with their varieties).
When the system sends a message to you, you ship each the tool-schemas you’ve got created and the system’s message to the Ollama API.
The Ollama API returns knowledge in a tool_calls block relatively than plain textual content.
You execute the operate utilizing the parameters despatched to you by the Ollama API.
You come the consequence again to the Ollama API as a ‘position’:’software’ response.
The Ollama API receives the consequence and returns the reply to you in pure language.

This two-pass sample is the muse for each function-calling AI agent, together with the examples proven beneath.

To execute these duties, you will want two elements: Ollama have to be put in regionally in your machine, and you will want to obtain the Gemma 4 Edge 2B mannequin. There are not any dependencies past what is supplied with the usual set up of Python, so that you don’t want to fret about putting in Pip packages in any respect.

1. To put in Ollama with Homebrew or MacOS:

# Set up Ollama (macOS/Linux) 
curl --fail -fsSL https://ollama.com/set up.sh | sh

2. To obtain the mannequin (which is roughly 2.5 GB):

# Obtain the Gemma 4 Edge Mannequin – E2B 
ollama pull gemma4:e2b

After downloading the mannequin, use the Ollama checklist to verify it exists within the checklist of fashions. Now you can hook up with the working API on the URL http://localhost:11434 and run requests in opposition to it utilizing the helper operate we are going to create:

import json, urllib.request, urllib.parse
def call_ollama(payload: dict) -> dict:
    knowledge = json.dumps(payload).encode("utf-8")
    req = urllib.request.Request(
        "http://localhost:11434/api/chat",
        knowledge=knowledge,
        headers={"Content material-Sort": "utility/json"},
    )
    with urllib.request.urlopen(req) as resp:
        return json.hundreds(resp.learn().decode("utf-8"))

No third-party libraries are wanted; due to this fact, the agent can run independently and gives full transparency.

Additionally learn: Methods to Run Gemma 4 on Your Cellphone: A Palms-On Information

Palms-on Activity 01: Dwell Climate Lookup

The primary of our strategies makes use of open-meteo that pulls stay knowledge for any location by a free climate API that doesn’t want a key with a view to pull the knowledge right down to the native space primarily based on longitude/latitude coordinates. For those who’re going to make use of this API, you’ll have to carry out a sequence of steps :

1. Write your operate in Python

def get_current_weather(metropolis: str, unit: str = "celsius") -> str:
    geo_url = f"https://geocoding-api.open-meteo.com/v1/search?title={urllib.parse.quote(metropolis)}&rely=1"
    with urllib.request.urlopen(geo_url) as r:
        geo = json.hundreds(r.learn())
    loc = geo["results"][0]
    lat, lon = loc["latitude"], loc["longitude"] 
    url = (f"https://api.open-meteo.com/v1/forecast"
           f"?latitude={lat}&longitude={lon}"
           f"&present=temperature_2m,wind_speed_10m"
           f"&temperature_unit={unit}")
    with urllib.request.urlopen(url) as r:
        knowledge = json.hundreds(r.learn())
    c = knowledge["current"]
    return f"{metropolis}: {c['temperature_2m']}°, wind {c['wind_speed_10m']} km/h"

2. Outline your JSON schema

This gives the knowledge to the mannequin in order that Gemma 4 is aware of precisely what the operate might be doing/anticipating when it’s known as.

 weather_tool = { 

    "kind": "operate",
    "operate": {
        "title": "get_current_weather",
        "description": "Get stay temperature and wind pace for a metropolis.",
        "parameters": {
            "kind": "object",
            "properties": {
                "metropolis": {"kind": "string", "description": "Metropolis title, e.g. Mumbai"},
                "unit": {"kind": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }

3. Create a question on your software name (in addition to deal with and course of the response again)

messages = [{"role": "user", "content": "What's the weather in Mumbai right now?"}] response = call_ollama({"mannequin": "gemma4:e2b", "messages": messages, "instruments": [weather_tool], "stream": False}) msg = response["message"]
if "tool_calls" in msg: tc = msg["tool_calls"][0] fn = tc["function"]["name"] args = tc["function"]["arguments"] consequence = get_current_weather(**args) # executed regionally
messages.append(msg) 
messages.append({"position": "software", "content material": consequence, "title": fn})
closing = call_ollama({"mannequin": "gemma4:e2b", "messages": messages, "instruments": [weather_tool], "stream": False}) 
print(closing["message"]["content"])

Output

Palms-on Activity 02: Dwell Forex Converter

The traditional LLM fails by hallucinating forex values and never having the ability to present correct, up-to-date forex conversion. With the assistance of ExchangeRate-API, the converter can get the newest overseas trade charges and convert precisely between two currencies.

When you full Steps 1-3 beneath, you’ll have a completely functioning converter in Gemma 4:

1. Write your Python operate

def convert_currency(quantity: float, from_curr: str, to_curr: str) -> str:
    url = f"https://open.er-api.com/v6/newest/{from_curr.higher()}"
    with urllib.request.urlopen(url) as r:
        knowledge = json.hundreds(r.learn())
    fee = knowledge["rates"].get(to_curr.higher())
    if not fee:
        return f"Forex {to_curr} not discovered."
    transformed = spherical(quantity * fee, 2)
    return f"{quantity} {from_curr.higher()} = {transformed} {to_curr.higher()} (fee: {fee})"

2. Outline your JSON schema

currency_tool = { 

    "kind": "operate",
    "operate": {
        "title": "convert_currency",
        "description": "Convert an quantity between two currencies at stay charges.",
        "parameters": {
            "kind": "object",
            "properties": {
                "quantity":    {"kind": "quantity", "description": "Quantity to transform"},
                "from_curr": {"kind": "string", "description": "Supply forex, e.g. USD"}, 
                "to_curr":   {"kind": "string", "description": "Goal forex, e.g. EUR"}
            },
            "required": ["amount", "from_curr", "to_curr"]
        } 
    }
}

3. Take a look at your resolution utilizing a pure language question

response = call_ollama({
    "mannequin": "gemma4:e2b",
    "messages": [{"role": "user", "content": "How much is 5000 INR in USD today?"}],
    "instruments": [currency_tool],
    "stream": False
})

Gemma 4 will course of the pure language question and format a correct API name primarily based on quantity = 5000, from = ‘INR’, to = ‘USD’. The ensuing API name will then be processed by the identical ‘Suggestions’ technique described in Activity 01.

Output

Gemma 4 excels at this activity. You’ll be able to provide the mannequin a number of instruments concurrently and submit a compound question. The mannequin coordinates all of the required calls in a single go; handbook chaining is pointless.

1. Add the timezone software

def get_current_time(metropolis: str) -> str: 

    url = f"https://timeapi.io/api/Time/present/zone?timeZone=Asia/{metropolis}"
    with urllib.request.urlopen(url) as r:
        knowledge = json.hundreds(r.learn())
    return f"Present time in {metropolis}: {knowledge['time']}, {knowledge['dayOfWeek']} {knowledge['date']}"
time_tool = {
    "kind": "operate",
    "operate": {
        "title": "get_current_time",
        "description": "Get the present native time in a metropolis.",
        "parameters": {
            "kind": "object",
            "properties": {
                "metropolis": {"kind": "string", "description": "Metropolis title for timezone, e.g. Tokyo"}
            },
            "required": ["city"]
        }
    }

2. Construct the multi-tool agent loop

TOOL_FUNCTIONS = { "get_current_weather": get_current_weather, "convert_currency": convert_currency, "get_current_time": get_current_time, } 

def run_agent(user_query: str): all_tools = [weather_tool, currency_tool, time_tool] messages = [{"role": "user", "content": user_query}] 

response = call_ollama({"mannequin": "gemma4:e2b", "messages": messages, "instruments": all_tools, "stream": False}) 
msg = response["message"] 
messages.append(msg) 
 
if "tool_calls" in msg: 
    for tc in msg["tool_calls"]: 
        fn     = tc["function"]["name"] 
        args   = tc["function"]["arguments"] 
        consequence = TOOL_FUNCTIONS[fn](**args) 
        messages.append({"position": "software]]]", "content material": consequence, "title": fn}) 
 
    closing = call_ollama({"mannequin": "gemma4:e2b", "messages": messages, "instruments": all_tools, "stream": False}) 
    return closing["message"]["content"]
return msg.get("content material", "")

3. Execute a compound/multi-intent question

print(run_agent(
    "I am flying to Tokyo tomorrow. What is the present time there, "
    "the climate, and the way a lot is 10000 INR in JPY?"
))e

Output

Right here, we described three distinct capabilities with three separate APIs in real-time by pure language processing utilizing one frequent idea. It contains all native execution with out cloud options from the Gemma 4 occasion; none of those elements make the most of any distant assets or cloud.

What Makes Gemma 4 Totally different for Agentic AI?

Different open weight fashions can name instruments, but they don’t carry out reliably, and that is what differentiates them from Gemma 4. The mannequin persistently gives legitimate JSON arguments, processes non-compulsory parameters accurately, and determines when to return data and never name a software. As you retain utilizing it, bear in mind the next:

Schema high quality is critically essential. In case your description discipline is imprecise, you’ll have a troublesome time figuring out arguments on your software. Be particular with items, codecs, and examples.
The required array is validated by Gemma 4. Gemma 4 respects the wanted/non-compulsory distinction.
As soon as the software returns a consequence, that consequence turns into a context for any of the “position”: “software” messages you ship throughout your closing cross. The richer the consequence from the software, the richer the response might be.
A typical mistake is to return the software consequence as “position”: “consumer” as a substitute of “position”: “software”, because the mannequin is not going to attribute it accurately and can try to re-request the decision.

Additionally learn: High 10 Gemma 4 Initiatives That Will Blow Your Thoughts

Conclusion

You will have created an actual AI agent that makes use of the Gemma 4 function-calling function, and it’s working solely regionally. The agent-based system makes use of all of the elements of the structure in manufacturing. Potential subsequent steps can embrace:

including a file system software that can permit for studying and writing native information on demand;
utilizing a SQL database as a way for making pure language knowledge queries;
making a reminiscence software that can create session summaries and write them to disk, thus offering the agent with the power to recall previous conversations

The open-weight AI agent ecosystem is evolving rapidly. The flexibility for Gemma 4 to natively assist structured operate calling affords substantial autonomous performance to you with none reliance on the cloud. Begin small, create a working system, and the constructing blocks on your subsequent initiatives might be prepared so that you can chain collectively.

Technical content material strategist and communicator with a decade of expertise in content material creation and distribution throughout nationwide media, Authorities of India, and personal platforms

1. To put in Ollama with Homebrew or MacOS:

2. To obtain the mannequin (which is roughly 2.5 GB):

Palms-on Activity 01: Dwell Climate Lookup

1. Write your operate in Python

2. Outline your JSON schema

3. Create a question on your software name (in addition to deal with and course of the response again)

Output

Palms-on Activity 02: Dwell Forex Converter

1. Write your Python operate

2. Outline your JSON schema

3. Take a look at your resolution utilizing a pure language question

Output

1. Add the timezone software

2. Construct the multi-tool agent loop

3. Execute a compound/multi-intent question

Output

What Makes Gemma 4 Totally different for Agentic AI?

Conclusion

Login to proceed studying and luxuriate in expert-curated content material.

This Researcher Trains Robots to Make Educated Guesses

Donald Trump’s White Home UFC Occasion Would Be Embarrassing Wherever

Deloitte Japan Advances Safety Operations with Cisco Basis AI’s Open-Supply Mannequin

Was “Tik-Tok of Oz” the First Clever Robotic to Seem in Literature?

CrankGPT Is Assured to Make You Cranky

From Intelligence to Motion: Operationalizing MS-ISAC Risk Knowledge Throughout SLED Environments

UrbanV and Japan Airport Consultants (JAC) announce a strategicpartnership to develop AAM in Japan and past – sUAS Information

New Boson SX8 Brings Excessive-Decision Thermal Imaging to NDAA-Compliant Drone Payloads

The best way to Generate AI Movies utilizing Gemini

Financial institution CCM Modernization: From Paperwork to Dialogue with AI

UrbanV and Japan Airport Consultants (JAC) announce a strategicpartnership to develop AAM in Japan and past – sUAS Information

Aviation Gasoline Demand Doesn’t Collapse. Low-cost Kerosene Development Does.