Muschamp Rd

Baidu: China’s Search Engine

October 31st, 2005
China

First of all I’d like to again thank Guo Dan 郭耽, Zhu Danna 朱丹娜, Jeff Tang, Prof. Dr. Max von Zedtwitz and anyone else who helped me get to interview Guo Dan and Jeff from Baidu. Furthermore they were cool enough to let me blog about it. Though not bloggers themselves they were quite stoked at having beat BlogCN in basketball the day before.

This blog posting is based off a rough draft of a paper I will be writing for my Managing Global Innovation class at Tsinghua School of Economics and Management. The actual interview took place on October 20th 2005. I eventually made the actual paper available as a PDF download, some people will probably prefer the information in that form. If it means anything I got the highest mark in the class for my presentation and paper, or so I was told by a fellow exchange student.

Me at Baidu

For those not in the know, Baidu is the leading Chinese language search engine on the Internet. They are regularly listed by Alexa as one of the most popular sites on the Internet. Their recent successful IPO on the NASDAQ stock exchange has catapulted them to international renown. Although primarily operating in China they compete globally against other search engines such as Yahoo, Google, and MSN.com for business. They also compete with many other Chinese internet properties for advertising dollars and staff. Advertising including ‘pay for placement’ is the primary source of revenue for Baidu.

Guo Dan is one of the first employees hired by Baidu. At Baidu research is divided into two major areas. One area focuses exclusively on Baidu’s spider and its’ indexing efforts. This is referred to as the Search group. Guo Dan heads up the other side which they call Search Channels, this basically is all the other features of Baidu.com, including the searching and indexing of MP3s, discussion forums, and news sites.

Jeff is a senior engineer at Baidu. He was recruited to move back to China from the Silicon Valley. In the Valley he had worked for Netscape, AOL, and several other smaller start up companies. He is primarily focused on ERP and workflow improvements both inside of R&D and on how R&D integrates into the greater Baidu operation.

Guo Dan describes Baidu as a technology company first and foremost and thus R&D is at the heart of their operation and always will be. Five out of eight original Baidu employees were trained as engineers. R&D investment is 12% of yearly revenues. The culture of Baidu and its’ non-traditional workplace with its’ relaxed atmosphere is crucial to attracting and retaining high quality employees according to Guo Dan.

One of the key questions I put to Guo Dan was whether technology was developed and pushed onto the market or whether market demand or a niche was identified then R&D was directed to develop a product. At Baidu it seems to be a mixture of the two strategies with approximately 50% of products developed as pure R&D and the other half developed to fill a particular niche or request from advertisers/end-users.

I also inquired whether improving the quality of the data indexed by Baidu was the focus of research or whether increasing the size of the data indexed by Baidu was the primary driver while spidering/indexing the web. Improving the quality of the index is the focus of R&D this includes ensuring the freshness of the results, eliminating duplicate data, and combating the spammers who attempt to influence the search results for their own financial gain. Guo Dan revealed that Chinese spammers are much more clever and dedicated than American spammers and thus spamming is possibly a far bigger problem for Baidu than it is for say Google.

Every feature developed at Baidu is ultimately aimed at advertisers. Everything is free for the end-user. The goal of R&D is to develop useful tools and services for Chinese consumers and recoup development costs and generate future revenue streams through the sales of ads. This includes accepting payment for placement in the search results a practice largely pioneered by Inktomi, now part of Yahoo.

Me at Baidu

Although there are currently no official plans to expand Baidu’s services to languages other than Chinese, there is little need to. China will soon become the country with the most citizens online. This does not include the significant number of Chinese speaking websurfers living in places like Taiwan and Singapore or the million plus Canadians of Chinese ancestry. That said Baidu’s spider works automatically and needs to be able to consider and handle languages other than Chinese. The Baidu spider has visited my own humble website which is almost exclusively written in English and the website for Baidu.com itself can handle queries in English. So although China and the Chinese language remain the focus, as Baidu indexes more and more of the web it will continue to expand the portion of its’ index which is not Chinese.

Other reasons Baidu continues to focus on the Chinese market is their ability to compete really well for Chinese speaking staff, the scarcity of individuals who truly understand search engines, particularly as they relate to the parsing of natural Chinese, and the significant training costs to hire non-Chinese speaking R&D staff.

The uniqueness of Chinese, which is compounded online, is a hurdle Baidu has had to overcome. The problems presented by the two rival character sets, the numerous competing encoding standards, and the difficulty parsing natural Chinese particularly things such as idioms and proverbs now provides a barrier to entry against competitors such as Google and Microsoft.

For example one of my personal favorites is 不三不四 which literally translates as “not three not four“. However not only is the actual meaning dubious/shaddy according to my computer’s dictionary but it is also a bit rude according to Chinese people I’ve talked to. The characters themselves are very common which is why I took a liking to it. All of the characters are only a few strokes and easily recognizable to any novice Chinese language student. Handling expressions and idioms like this is particularly problematic for search engines.

Chinese people have fundamentally different preferences on how they use the Internet, this can be observed comparing Chinese portals such as Sohu.com to American counterparts such as Yahoo.com. The design of the Chinese site is considerably busier with all manner of blinking, flashing, and moving parts. Baidu.com is visibly similar to Google.com but when you click on search results in Baidu it spawns a new window which is a preference of Chinese web surfers, where as Western web surfers prefer the opposite. I also got the impression that Chinese web searches tended to use more natural language queries as opposed to simply entering a keyword or phrase as done in the West. This further emphasizes the importance to Baidu of algorithmically being able to understand and index natural Chinese.

Baidu has also invested heavily in technology to extract text from non-textual sources such as MP3s and videos. Additional focuses of R&D along these lines is on information filtering. Furthermore documents in .doc or .pdf format for instance are first convert to plain text then indexed. Baidu also has the ability to adjust its’ algorithm to place greater emphasis on the timeliness of data, this is used in their News search.

I asked Guo Dan about the possibility of a Baidu blog search, he said they were of course looking at it. He didn’t anticipate it being difficult to implement. He said Baidu’s news search already places the emphasis on timeliness while decreasing the emphasis on relevance which is what a blog search engine would do. One piece of advise I’d give Baidu and something I have to do at Muschamp.ca is wrap a span tag around the main Chinese character phrases in their UI with a title attribute with an English translation. Baidu already makes extensive use of subdomains so it is possible to figure out which parts of Baidu do what using the domain and some patience. But using a span tag around the characters like this: 中文 doesn’t visibly alter the site but would likely improve the usability of Baidu to non-fluent in Chinese users.

We never really got into the whole Web 2.0 thing. That said when I had lunch with Edwyn Chan,we never talked about that either and it is the focus of his entrepreneurial efforts. We also never talked VC financing or anything like that. The interview was focussed on R&D in China from a global perspective. Just that day I had bought an issue of Forbes Asia as it mentioned one of Baidu’s backers Draper Fisher Jurvetson ePlanet Ventures, particularly Finian Tan, on whom they had a feature. As further proof of how small the world really is, just this evening an email arrived in my mailbox from the Tsinghua MBA mailing list concerning a job opportunity with that very firm. I even applied.

For the record I am not a regular reader of Forbes Asia. I much prefer The Economist. I find the quality of writing better, the content less American-centric and the ads less frequent. Thus the ratio of English words to Yen or RMB spent is much higher. I discovered this while living in Japan. Time Asia or Japan or whatever is particularly bad for this. It costs as much or more than the Economist but you can read it so quickly. It is mostly pictures and ads. It is not worth its’ price in Japan however I felt the Economist was. As a result, while abroad in Asia, the Economist is my magazine of choice.

Also for the record I don’t buy into the whole Web 2.0 hype. The internet was always intended to be participatory. Tim Berners-Lee‘s orginal concept of the web and the home page, was just that your home on the Internet. I know I’ve had a home page online for over ten years as has say Jeffery Zeldman. Of course some people have done a little more with their homepage than others. Now that I have my own computer I’ve become a more prolific publisher.


Posts on Muskblog © Andrew "Muskie" McKay comments not necessarily so...
Social Media Icons created by komodomomedia, used with permission.