Technical SEO

Technical SEO for Southeast Asian SMEs, what matters at 50 to 250 people

Technical SEO advice written for enterprise budgets doesn't work in Jakarta or Manila. Here's what actually moves the needle when you're running with 3 people.

A Malaysian e-commerce brand I spoke to last month told me their site had been rebuilt twice in 18 months. Both agencies promised "SEO-friendly builds." Both left them with a site Google struggled to crawl properly, schema that validated but didn't help, and JavaScript that rendered fine in Chrome but broke for bots. The marketing lead, running a team of three, spent six weeks cleaning up canonicals and redirect chains before she could even think about content.

This is the technical SEO reality for SMEs in Southeast Asia. You don't have a specialist on payroll. You're coordinating between an offshore dev team that's juggling five clients, a founder who wants the site to look like a SaaS unicorn, and you, the person expected to get organic traffic growing this quarter while fixing the plumbing nobody else can see.

I run technical audits for Series A through Series C companies across Indonesia, Singapore, Malaysia, and the Philippines. The broken patterns I see repeat across borders. This is the field guide I wish existed when I started doing this work 15 years ago.

If Google and AI crawlers can't reach it, nothing else matters

Your site needs to be crawlable before anything else works. Crawlability isn't glamorous, but it's binary. Either bots can access your pages or they can't.

Start with robots.txt. Go to `yoursite.com/robots.txt` right now. If you see `Disallow: /` under `User-agent: *`, you've blocked everything. If you see `Disallow: /` under `User-agent: GPTBot` or `User-agent: PerplexityBot`, you've opted out of AI search. That might be intentional. It might also be a default your dev team copied from a template and never revisited.

The robots.txt I recommend for small teams looks like this if you want AI visibility:

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ChatGPT-User
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

If you're blocking bots, own that choice. If you're not blocking them on purpose, fix it this week.

Next, check whether Googlebot is actually crawling. Open Google Search Console, go to Settings > Crawl Stats. If you're seeing spikes or drops without obvious cause, you have a signal. If average crawl time is above 2 seconds, your server or site speed is slowing Google down. If Googlebot is requesting pages that shouldn't exist, you're leaking crawl budget to junk.

The other crawl problem I see across Southeast Asian SMEs is Cloudflare blocking AI bots by default. If you set up Cloudflare recently and never checked the bot settings, GPTBot and PerplexityBot are probably getting turned away at the firewall. Go into Cloudflare > Security > Bots and verify your allow list. You have to do this manually. The default blocks them.

Mobile rendering is not optional in this region

Desktop-first development still happens, especially when offshore agencies build your site on a MacBook Pro and ship it without testing on a $150 Android phone running a throttled 3G connection in Metro Manila or outer Jakarta.

Your users are on mobile. Search traffic in Southeast Asia is overwhelmingly mobile, with over 78% of e-commerce happening on phones. Google indexes mobile-first. If your mobile version is broken or slow, you're invisible.

Test your site's mobile performance in Search Console under Experience > Core Web Vitals. If you're failing on mobile, click through to see which pages and which metrics. The three Core Web Vitals that matter are LCP (Largest Contentful Paint), INP (Interaction to Next Paint), and CLS (Cumulative Layout Shift).

LCP measures how fast the main content loads. Target is under 2.5 seconds. If you're failing LCP, your images are probably too large or unoptimized. Compress them. Serve them in WebP or AVIF format. Use lazy loading for images below the fold.

INP measures how fast the page responds to user input. Target is under 200ms. If you're failing INP, JavaScript is blocking the main thread. This is the hardest one to fix without dev time. Ask your developer to defer non-critical scripts or move them to a web worker.

CLS measures how much the page jumps around while loading. Target is under 0.1. If you're failing CLS, elements are loading without reserved space. Set explicit width and height attributes on images, ads, and embeds. Don't inject content above existing content after page load.

Mobile speed also matters because page weight affects users on metered data plans. A 5MB homepage is expensive to load in Indonesia or the Philippines. Aim for under 1MB for the initial load. Tools like GTmetrix or PageSpeed Insights will tell you where the weight is coming from.

Schema markup that actually moves a metric

Schema is one of the few technical SEO tactics where I can show you a visible lift. It makes your listings richer in Google and gives AI engines structured data they can parse without guessing.

The mistake I see is over-implementing schema on pages that don't matter, or implementing complex nested structures when simple JSON-LD would work.

For an SME, here's the schema priority list I walk through with every client. Start with Organization schema on your homepage. This connects your brand entity to your domain. Include your logo, social profiles, contact info. Second, add LocalBusiness schema if you have physical locations. Include address, phone, opening hours. This feeds Google Business Profile and map pack results.

Third, implement Product schema on product or service pages. Include price, availability, reviews if you have them. This enables rich snippets. Fourth, add Article schema on blog posts. Include author, date published, date modified. This connects content to your brand entity. Fifth, add FAQ schema on pages with question-answer pairs. AI engines love this. It's the easiest schema to get cited from.

You don't need BreadcrumbList, VideoObject, or Event schema unless you actually publish videos, run events, or have deep category hierarchies. Schema for schema's sake wastes dev time.

Validate your schema with Google's Rich Results Test. If it throws errors, fix them. If it validates but you're not seeing rich results after 4 weeks, the page either isn't ranking or Google decided the schema doesn't qualify. Move on.

Canonicals and indexation, the invisible SEO leak

Canonical tags tell Google which version of a page is the primary one. If you have `yoursite.com/product` and `yoursite.com/product?utm_source=email`, both URLs exist but you want Google to index only the first. You add a canonical tag to the second pointing to the first.

The problem is canonical tags fail silently. If you set a canonical incorrectly, Google ignores it and picks its own version. Or worse, you canonicalize a page to a different page, and Google stops indexing the page you wanted.

Common canonical mistakes in Southeast Asian SME sites include every page canonicalizing to the homepage, which tells Google only the homepage matters. You've de-indexed your entire site. Another mistake is canonicals pointing to a different domain. If you copied code from another site or a staging environment, you might be canonicalizing to the wrong domain.

I also see canonicals on paginated pages all pointing to page 1. Google won't index pages 2, 3, 4 of your blog or product category. And finally, HTTPS canonicals on HTTP pages or vice versa. Pick one protocol and stick to it.

Check your canonicals by viewing page source and searching for `rel="canonical"`. The URL in the href attribute should match the page you're on, unless you genuinely want to consolidate multiple URLs to one.

For indexation, open Google Search Console and go to Indexing > Pages. The graph shows indexed vs not indexed. Click "Not indexed" and review the reasons. The big ones I see in every audit are "Crawled, currently not indexed" (Google visited but decided not to index, usually a quality or duplicate content signal), "Discovered, currently not indexed" (Google found the URL but hasn't crawled it yet, low priority page or crawl budget issues), "Blocked by robots.txt" (you told Google not to crawl, might be intentional or a mistake), and "Duplicate without user-selected canonical" (Google thinks this page is a duplicate of another and picked a different one as the canonical).

If you have thousands of "Crawled, currently not indexed" pages, you probably have thin content or a large site with quality issues. The fix is not to ask Google to recrawl. The fix is to improve the content or remove the pages.

Internal linking as crawl steering

Internal links are how Googlebot and AI crawlers navigate your site. They're also how you signal importance. Pages linked from your homepage and main navigation get crawled more often and rank better than orphan pages buried five clicks deep.

The internal linking pattern I see break down in SME sites is the blog. You publish 50 posts, none of them link to each other, and none of them link to your product or service pages. Organic traffic goes to the blog, reads one post, and leaves. The blog isn't feeding the funnel because there's no internal link bridge.

The fix is simple. Every blog post should link to at least one service or product page where relevant, and at least two other related blog posts. Use descriptive anchor text. Don't use "click here" or "read more."

For your most important pages (product pages, service pages, the pages that convert), count how many internal links point to them. If they have fewer than 5 internal links, they're under-supported. Add contextual links from blog posts, from your main navigation, from your footer if it makes sense.

Tools like Screaming Frog or Sitebulb will crawl your site and show you internal link counts per page. If you don't have a tool budget, manually check your top 20 revenue-driving pages in Google Analytics and count how many places on your site link to them.

The crawl-log audit for sites over 1,000 pages

If your site is under 500 pages, skip this section. You don't need it yet.

If your site is over 1,000 pages, you need to know what Googlebot is actually crawling versus what you want it to crawl. The only way to know is to parse your server logs.

In the crawl-log audits I run, the pattern I find in almost every large SME site is Googlebot spending 30 to 40 percent of its crawl budget on pages that don't need to be crawled. Faceted navigation URLs, paginated archives, old staging paths, parameter variations. These pages leak crawl budget and delay indexation of pages that matter.

The fix is to noindex or block the junk, clean up your internal links so you're not linking to low-value URLs, and update your sitemap to include only the pages you want indexed.

This is not beginner work. If you're not comfortable with log files and Python, hire help. But if you have thousands of pages and you're seeing indexation delays, this is the first place I'd look.

XML sitemaps, the map Google sometimes ignores

Your XML sitemap is a list of URLs you're asking Google to crawl and index. It doesn't guarantee indexation. It's a suggestion.

Generate your sitemap with your CMS (WordPress, Shopify, Webflow all do this automatically) or a plugin. Submit it in Google Search Console under Sitemaps. Check back a week later to see how many URLs Google discovered versus indexed.

If Google discovered 1,000 URLs and indexed 400, the other 600 are either low quality, duplicate, or blocked. Go back to the "Pages" report in Search Console and diagnose.

Common sitemap mistakes include putting in URLs that redirect (sitemaps should only list canonical, indexable URLs with 200 status codes), including noindexed pages (if the page has a noindex meta tag, don't put it in the sitemap), making the sitemap too large (Google's limit is 50,000 URLs per sitemap file or 50MB uncompressed, split into multiple sitemaps if you're over that), and not updating the sitemap after site changes (if you delete or move pages, regenerate and resubmit your sitemap).

For AI crawlers, your XML sitemap helps but is not required. GPTBot and PerplexityBot will follow your sitemap if it's declared in robots.txt, but they also crawl your HTML links. The sitemap accelerates discovery.

Redirect chains and 404s, the low-hanging cleanup work

A redirect chain happens when URL A redirects to URL B, which redirects to URL C. Google and users follow the chain, but it's slow, and Google may stop following after 3 to 5 hops. Each redirect wastes time.

Check for redirect chains with Screaming Frog or by manually testing URLs you've moved. Fix them by updating the first redirect to point directly to the final destination.

404 errors are pages that no longer exist. A few 404s are normal. Hundreds or thousands signal a broken site structure. Check Google Search Console under Indexing > Pages for "Not found (404)" errors. If you see old product pages, blog posts, or category pages returning 404, decide whether to restore them, redirect them to a relevant page, or leave them as 404 if they're genuinely obsolete.

For pages that used to rank and still get traffic, redirect them. For pages nobody links to and nobody visits, let them 404. Don't redirect every dead page to your homepage. That's a soft 404 and Google treats it as low quality.

When to fix technical SEO versus when to ship content

You're running a small team. You have competing priorities. Here's how I decide what to fix first.

If Google can't crawl your site, fix that before anything else. Check robots.txt, check Cloudflare, check server errors. Crawlability is binary. You're either accessible or you're not.

If Google is crawling but not indexing, check canonicals, check for duplicate content, check for noindex tags you didn't intend. Indexation problems hide your content completely.

If you're indexed but not ranking, the problem is probably not technical. It's content, backlinks, or competition. Don't chase technical perfection when the real gap is you have 10 blog posts and your competitor has 200.

If you're ranking but your click-through rate is low, add schema, improve your title tags and meta descriptions, get FAQ schema on pages that answer questions.

Technical SEO is the foundation, but it's not the whole house. I see teams spend 8 weeks fixing technical debt on a 200-page site when they'd get more lift from publishing 20 new pieces of content. Balance is everything.

The technical foundation I'd build for an SME in Southeast Asia, assuming zero budget for tools and 4 hours a week of internal time, includes robots.txt allowing Googlebot and AI crawlers, mobile-first design that passes Core Web Vitals on mobile, Organization and LocalBusiness schema on the homepage, Product or Article schema on your top 20 pages by business value, clean canonicals (every page canonicals to itself unless you're consolidating), XML sitemap with only indexable pages submitted to Search Console, internal links from blog posts to product pages and between related posts, no redirect chains longer than one hop, and 404 cleanup for pages that used to rank.

That list is achievable in 30 days if you have developer access and a CMS that doesn't fight you. Everything else (log-file analysis, advanced schema, crawl-budget optimization) is a second-quarter project.

If I'm working with a Series A founder in Jakarta or Manila, this is the checklist I run in the first 30-minute strategy call. We look at Search Console together, I spot the top three blockers, and I tell them what to fix this month versus what can wait. Technical SEO at SME scale is triage, not perfection.

The sites that win in Southeast Asia are not the ones with flawless technical scores. They're the ones that load fast on a cheap phone, that give Google and ChatGPT something clear to index, and that connect their content to their entity clearly enough that machines understand what the business does. Get that right and you buy yourself room to compete.

If you're running marketing for a Southeast Asian company and need someone to audit your technical setup or train your team on what actually matters at your scale, I do both as a consultant. The work is faster when someone has seen the same broken patterns 50 times before.


Chat on WhatsApp