V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐工具
RoboMongo
推荐书目
50 Tips and Tricks for MongoDB Developers
Related Blogs
Snail in a Turtleneck
linkbg
V2EX  ›  MongoDB

Mongodb 中全文索引居然比正则慢

  •  
  •   linkbg · 2017-06-07 00:53:46 +08:00 · 3542 次点击
    这是一个创建于 2737 天前的主题,其中的信息可能已经有所发展或是发生改变。

    版本:

    > version()
    3.4.3
    

    创建全文索引:

    > db.tests.createIndex({'a':'text'}) # a 字段的值很大超过 1024 个字符
    

    查询

    > db.tests.find({'$text':{'$search':'ere'}).explain('executionStats')
    "executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 80018,
    		"executionTimeMillis" : 306877,
    		"totalKeysExamined" : 83546,
    		"totalDocsExamined" : 83546,
    		"executionStages" : {
    			"stage" : "TEXT",
    			"nReturned" : 80018,
    			"executionTimeMillisEstimate" : 306699,
    			"works" : 167095,
    			"advanced" : 80018,
    			"needTime" : 87076,
    			"needYield" : 0,
    			"saveState" : 16525,
    			"restoreState" : 16525,
    			"isEOF" : 1,
    			"invalidates" : 0,
    			"indexPrefix" : {
    				
    			},
    			"indexName" : "a_text",
    			"parsedTextQuery" : {
    				"terms" : [
    					"ii"
    				],
    				"negatedTerms" : [ ],
    				"phrases" : [
    					"ere"
    				],
    				"negatedPhrases" : [ ]
    			},
    			"textIndexVersion" : 3,
    			"inputStage" : {
    				"stage" : "TEXT_MATCH",
    				"nReturned" : 80018,
    				"executionTimeMillisEstimate" : 306649,
    				"works" : 167095,
    				"advanced" : 80018,
    				"needTime" : 87076,
    				"needYield" : 0,
    				"saveState" : 16525,
    				"restoreState" : 16525,
    				"isEOF" : 1,
    				"invalidates" : 0,
    				"docsRejected" : 3528,
    				"inputStage" : {
    					"stage" : "TEXT_OR",
    					"nReturned" : 83546,
    					"executionTimeMillisEstimate" : 305932,
    					"works" : 167095,
    					"advanced" : 83546,
    					"needTime" : 83548,
    					"needYield" : 0,
    					"saveState" : 16525,
    					"restoreState" : 16525,
    					"isEOF" : 1,
    					"invalidates" : 0,
    					"docsExamined" : 83546,
    					"inputStage" : {
    						"stage" : "IXSCAN",
    						"nReturned" : 83546,
    						"executionTimeMillisEstimate" : 1103,
    						"works" : 83547,
    						"advanced" : 83546,
    						"needTime" : 0,
    						"needYield" : 0,
    						"saveState" : 16525,
    						"restoreState" : 16525,
    						"isEOF" : 1,
    						"invalidates" : 0,
    						"keyPattern" : {
    							"_fts" : "text",
    							"_ftsx" : 1
    						},
    						"indexName" : "a_text",
    						"isMultiKey" : true,
    						"isUnique" : false,
    						"isSparse" : false,
    						"isPartial" : false,
    						"indexVersion" : 2,
    						"direction" : "backward",
    						"indexBounds" : {
    							
    						},
    						"keysExamined" : 83546,
    						"seeks" : 1,
    						"dupsTested" : 83546,
    						"dupsDropped" : 0,
    						"seenInvalidated" : 0
    					}
    
    

    1,不是很理解,为什么需要做 TEXT_OR 和 TEXT_MATCH 的操作?

    理想状态下,全文索引应该比正则快,要不然还要全文索引干什么,但是,如上的查询条件使用正则(无论存不存在 b_text 索引,和避免热数据,重启 mongo )都是一样的结果

    > db.tests.find({'a':{'$regex':'ere','$options':'i'}}).explain('executionStats')
    "executionStats" : {
    		"executionSuccess" : true,
    		"nReturned" : 81319,
    		"executionTimeMillis" : 101701,
    		"totalKeysExamined" : 0,
    		"totalDocsExamined" : 4256954123,
    		"executionStages" : {
    			"stage" : "COLLSCAN",
    			"filter" : {
    				"a" : {
    					"$regex" : "ere",
    					"$options" : "i"
    				}
    			},
    			"nReturned" : 81319,
    			"executionTimeMillisEstimate" : 101391,
    			"works" : 4256956,
    			"advanced" : 81319,
    			"needTime" : 4175636,
    			"needYield" : 0,
    			"saveState" : 33964,
    			"restoreState" : 33964,
    			"isEOF" : 1,
    			"invalidates" : 0,
    			"direction" : "forward",
    			"docsExamined" : 4256954
    		}
    
    

    是不是我打开的方式不对呢?麻烦给指点一下。谢谢

    2 条回复    2017-06-07 07:49:12 +08:00
    mooncakejs
        1
    mooncakejs  
       2017-06-07 06:48:09 +08:00 via iPhone
    数据有几条?
    linkbg
        2
    linkbg  
    OP
       2017-06-07 07:49:12 +08:00 via iPhone
    @mooncakejs 百万级数据
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1062 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 32ms · UTC 22:20 · PVG 06:20 · LAX 14:20 · JFK 17:20
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.