Gemini API File Search is now multimodal

(blog.google)

152 points | by gmays 1 day ago

18 comments

FrequentLurker 1 day ago
This might be great and all but I am still miffed at how simple search on AI Studio is. You can only search the titles of your conversations and nothing inside them. On top of that they messed with the scrolling so Ctrl+F doesn't work reliably.
[-]
- pants2 1 day ago
  It's incredible how far behind Gemini has gotten, both the product and the model. Even the ChatGPT plugin for Google Sheets blows away the native Gemini integration.
  Everyone thought Google was pulling ahead with Gemini 3. For a minute there they had the best language model, image model, AND video model in the world. But it's like they decided to pull over for a nap while OpenAI and Anthropic flew by.
  [-]
  - bachmeier 1 day ago
    Maybe they've decided they don't want to play the same game as OpenAI and Anthropic? They're much better positioned for the high volume AI work that's likely to be where the money is made, with calls to APIs doing routine things for all the businesses of the world. They're also the only big US player that has an open model that you can build on. I don't think vibe coding or the most cutting edge capabilities are what will determine profit from AI.
    [-]
    - stingraycharles 1 day ago
      > They're much better positioned for the high volume AI work that's likely to be where the money is made, with calls to APIs doing routine things for all the businesses of the world
      How, exactly, are they currently conquering the enterprise world with their models? What do you think Anthropic is doing?
      Their latest proper model is a year old, they have no moat, no enterprise commitment.
      Your comment would make sense if they would have actual success in the enterprise market and would have actual products in that area, but they don’t.
      They had a brief sprint, caught up, and then dropped the ball again.
      Their only current moat is their TPUs, and the fact that
      1. The whole (successful) LLM world is screaming for capacity
      2. They have excess capacity to rent out, just like Grok
      Tells everything.
      [-]
      - lukeschlather 1 day ago
        > Their latest proper model is a year old
        What's a "proper model?" Gemini 3.1 Pro was released 3 months ago. Gemini Robotics 1.6 was released a month ago. And Google is vertically integrated, they aren't just selling tokens, they are selling Taxi rides with Waymo. AI is a lot more than LLMs and Google is doing a lot more than LLMs.
      - macNchz 23 hours ago
        If you're building on top of APIs and can do some eval work (aka do not need the most bleeding edge model), the Gemini Flash and Flash Lite models are super capable for the price.
      - bachmeier 1 day ago
        > How, exactly, are they currently conquering the enterprise world with their models?
        I didn't say they were conquering the enterprise world. I said they are better positioned for the work that will be profitable in the future. Winning will mean being "good enough" for things like routine interactions with customers at the lowest cost to the business, and having customers fine tune your models using your hardware.
        > What do you think Anthropic is doing?
        Aside from being arrogant jerks that don't care about pissing off their customers, they're positioning themselves as the highest price provider for the highest end work. There will be a market for that, and maybe Anthropic will survive, but Google looks to me like they have a shot at being the profitable AI company.
    - Computer0 1 day ago
      GPT-OSS is still decent I think at least when I need a local LLM.
  - diegoperini 1 day ago
    I have the opposite experience where Gemini (even the flash models) has the only useful model for my reverse engineering related use case. My hunch is Google utilizes its free access to entire Google search indices to train itself from niche non-English speaking community websites, much frequently and in a "relevant" manner, which in the end gives these models the most up to date info for this particular kind of work. Every other model is just either 10 years outdated with their answers or simply hallucinates like waaaay crazy.
    [-]
    - embedding-shape 1 day ago
      > for my reverse engineering related use case [...] Every other model is just either 10 years outdated with their answers
      I've mostly been doing reverse engineering with Codex, mostly related to games, but not once has the "training data cut-off date" been in the way, the most useful part comes from handing it a binary/directory and letting it prod it until it finds the answer you're looking for, I don't even have web search enabled and sometimes it might take 30-40 minutes for it to find the answer, but I never saw it be unable to find the answer because it's training data was a couple of years old.
  - comboy 1 day ago
    3.1-pro is still very capable, and API is at competitive price vs e.g. Anthropic, they just can't seem to figure out RLHF and harness. It needs a lot of guiding, it tends to be lazy and poorly sticking to instructions by default.
    It just feels like many google products really, they are capable of really amazing things, it's just that nobody there seem to care. I would guess they are likely optimizing more for internal use than their vast userbase.
    [-]
    - logicchains 1 day ago
      They optimize for making their SRE's lives easier, over quantizing models regardless of how negative an effect that has on the user.
  - wilj 1 day ago
    I just cancelled my Gemini subscription yesterday. I have a big private fork of OpenCode, and I did it the wrong way to start with, so I couldn't pull from upstream.
    So I put together a plan for refactoring it, step by step, with tests, etc. After literally 8 solid days of fighting with Gemini 3 Pro, I still couldn't pull it off.
    I gave GPT 5.5 a chance with the same prompt, plans, and repo. I'm not sure how long it took, but when I checked in on it a few hours later it was done. All tests passed, everything exactly how I'd asked, and better (it made some improvements).
  - jmathai 1 day ago
    My non technical wife knows both ChatGPT and Anthropic (admittedly, because of me) but doesn’t know Gemini. This is amazing to me.
    Surely she has seen Gemini in Google search but even her use of that is plummeting.
    Google has so much revenue that they’ll be around for a long time. But I feel they are fumbling the opportunity with AI. Even in corporate, where we have Gemini. The conversation is fully around Claude. No one talks about Gemini.
    [-]
    - panarky 1 day ago
      > My non technical wife ...
      Reports of the death of Google Search have been greatly exaggerated.
      If you believe all the reports on HN about everyone's non-technical wives and grandmas, you'd have a hard time explaining the all-time highs in global usage and revenue from Google Search.
      I agree with you that Claude 4.7 Opus is better than Gemini 3.1 Pro, but it's also a lot more expensive.
      For my applications, I can't find better price-performance than Gemini 3.0 Flash. And it hasn't even been upgraded to 3.1 yet.
      I suspect Google's target is price-performance and not just raw performance, which is how they can serve LLM responses at Google Search scale and still set an all-time record for quarterly earnings of any public company ever.
      Frontier model capabilities leapfrog each other every few months, and Google I/O is in ten days, so I expect the leaderboard will change again soon.
      [-]
      - jonhohle 1 day ago
        Unfortunately, I think Google is in the process of killing the golden goose. I visit so few unrecognized websites now and primarily rely on “AI mode” to answer my specific question rather than sift through a handful of possibly accurate pages. How long can that go on before those sites just no longer exist and the source of that knowledge or new knowledge evaporates. Doesn’t seem like that model is sustainable long term.
        [-]
        bachmeier 1 day ago
        Honestly, I think the SEO virus killed that golden goose long before the first AI chat bot. If we still had good search taking us to sane websites, ChatGPT might well have never been a thing. I was posting (including on HN) about the vulnerability of Google's search business years before AI chat. It just happens to be the thing that filled the gap when usable search disappeared.
    - jorvi 1 day ago
      OpenAI and Anthropic have no moat. DeepSeek is a drop-in replacement that is really close in performance for 7.5-20% of the cost. That cost will continue to get pushed down by the Chinese. And bizarrely enough their models are more secure to use because they're open source open weights.
      OpenAI and Anthropic are going to get crushed long-term, and their investors are going to take a horrendous haircut.
      On the other hand, Google and Microsoft already have the users (and lock-in). They just need to funnel them into Gemini and CoPilot.
      [-]
      - discordance 16 hours ago
        I wish DeepSeek were a drop-in replacement, but it's not. It performs amazingly well but it's not as autonomous and needs a lot more nudges compared to Opus4.6/7 or GPT 5.5. It's good enough for a lot of things (text extraction, sentiment analysis, classifying things) but not on the same level for code gen.
      - pants2 1 day ago
        DeepSeek r1 affected markets because for a little while people bought this, but it's not true for so many reasons. Sending data to China is out of the question for every American Enterprise. OAI and Anthropic have rich product suites and API harnesses that make DS far from a "drop in replacement." They have better models, generous usage limits, domination of the zeitgeist, integrations with Slack and all the Enterprise SaaS platforms, and magnitudes more GPU capacity than DS. What you say simply isn't true.
        [-]
        jorvi 20 hours ago
        > What you say simply isn't true.
        Everything you said is wrong.
        - DeepSeek is on V4 now, R1 is ancient history
        - The models are open source open weights, which mean you can inspect what the models do and you you can choose US or EU infra providers
        - DeepSeek literally has a Claude-compatible endpoint
        Please don't comment and confuse other users on topics you know nothing about. Study. Then speak.
        [-]
        pants2 16 hours ago
        - Right, R1 affected markets because the market originally believed your theory, but it doesn't any more, which is why V4 didn't move markets at all.
        - Sure, you can use US infra providers. Together.ai is a good US provider but then it's 15X more expensive than DeepSeek's Chinese-subsidized pricing. It's really not that attractive at that price point. Anthropic and OpenAI are focused on larger models, but Grok 4.3[1] is smarter and significantly faster + cheaper than DS4[2] and by a wide margin.
        - DeepSeek has a Claude-compatible messages API, but that's trivial. Anthropic has a massive API platform with things like Sessions, Files, and Agents[3]. None of those are available on DeepSeek.
        1. https://artificialanalysis.ai/models/grok-4-3
        2. https://artificialanalysis.ai/models/deepseek-v4-pro
        3. https://platform.claude.com/docs/en/api/overview
        [-]
        jorvi 1 hour ago
        Now we're talking!
        - V4 will definitely move markets, especially as Claude and OpenAI keep jacking up the prices more and more. But inertia exists. Give it time.
        - Most US infra providers are ~5x more expensive than Chinese infra, not 15x. But yes you are right. It does erode the cost advantage significantly. Big asterisk is that V4 seems to have solid cache hit percentage, often in the high 90s.
        - Grok (and Llama) always underperform relative to their benchmark and ranking results. Don't ask my why, but it's a persistent pattern me and colleagues have noticed. I'll give them another try though, more competition is better.
        - DeepSeek themselves have specifically said they prefer developing high performance models that you plug into other tooling, including Claude's. Regardless, I think it's unrealistic to expect DeepSeek to offer 1:1 suite compatibility with Claude or OpenAI. You wouldn't expect that from OpenAI <> Claude either.
  - riddlemethat 1 day ago
    It’s still the best option for uptime, document analysis (on a cost basis), and Google is less likely to experience a significant cybersecurity breach than a less established company. They’ll be fine as long as they stay in the game even if they never have a Ferrari again plenty of people buy Toyota.
  - thefounder 1 day ago
    I never felt Gemini was ever better than the OpenAI or Anthropic. I think it’s more on par with open source models than the top 2
- qingcharles 1 day ago
  I've come across a few weird search issues like this with Google lately. Entire company built on the best search engine ever created; can't do search properly in their apps.
- stingraycharles 1 day ago
  Yeah, it’s surprising, Claude Desktop has had project files since decades which are chunked/indexed and automatically injected into your context based on the topic.
  You’d think this would be fairly obvious for Google to do, but it’s probably an organizational problem rather than a technical one.
- sega_sai 1 day ago
  The search in Gemini app in the browser is so embarrassingly bad that I get an impression that nobody of importance in Google must be using it otherwise they would have fixed long ago.
- greesil 1 day ago
  Too bad they can't just easily vibe code new features.
  [-]
  - telotortium 1 day ago
    Ironically, they probably could if they used Codex or Claude Code. Those harnesses and models are good enough to do that these days (since late last year, and getting better since then). However, it seems that DeepMind and no one else at Google has access to either of these.
  - bloqs 1 day ago
    Yeah, what happened to no more SWE
- varispeed 1 day ago
  I am more miffed that you cannot delete conversations.
noashavit 6 hours ago
The race for unstructured data continues. It feels like everyone is trying to crack unstructured data extraction with the underlying goal of ultimately using AI to classify and tag insights from unstructured data to create a structured data/graphs for agents to consume and traverse.
lousken 1 day ago
Haven't touched gemini api since they did not support having a $ limit per api key. Is it possible now?
[-]
- jwithington 1 day ago
  Yes https://ai.google.dev/gemini-api/docs/billing#spend-caps
  [-]
  - lousken 1 day ago
    Finally!
- algoth1 1 day ago
  With a 10min delay, via aistudio
ecommerceguy 1 day ago
My free trial ends this week, which i'm obviously canceling.
FirstPoint 1 day ago
It’s a striking irony that the world's leader in search is receiving so much heat for poor search functionality and UX within its own flagship AI products
[-]
- WarmWash 1 day ago
  One of Googles core problems is internal silos of talent. The search team has likely never interacted with the Gemini app team or perhaps even the Gemini app.
  For all intents and purposes Google Gemini is a totally separate company from Google search.
  [-]
  - kyrra 1 day ago
    Google has ~190k employees. You can't have total collaboration across that many people.
    Teams will cross collaborate, but they have to be for specific projects with specific people.
ninjagoo 1 day ago
Is anyone tech-savvy going to actually let any tool with this backend run on their personal PCs?
Any app with this behind the scenes is a non-starter for me.
And anyone think that all those folks ditching Win11 will be going for or recommending any app built on this?
thawab 16 hours ago
Tried multiple times to use the api file search and it’s complex to setup. Ended up going a different approach.
CalmBirch127 9 hours ago
[dead]
HollowRidge427 9 hours ago
[dead]
WindyBolt907 1 day ago
[dead]
Alifatisk 1 day ago
[dead]
deferredgrant 1 day ago
[flagged]
zafronix 1 day ago
[flagged]
paidwork 1 hour ago
[dead]
WindyBolt907 1 day ago
[dead]
immanuwell 1 day ago
[dead]
Owen_Silva 1 day ago
[flagged]
trilogic 1 day ago
Good to have a choice between clouds and local use.
How much would you pay to have this yours forever, running locally, GDPR and HIPaa compliant, without the headache of privacy or subscriptions.
That´s what we offer with HugstonOne and we did it before Google. Multimodal, Lighting fast RAG, terabytes not kilobytes only :)
All you need is a 32gb ram laptop and HugstonOne, not a rocket science.