请问这个正则表达式如何写

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 3475 天前的主题，其中的信息可能已经有所发展或是发生改变。

<dt><a name="313"></a>ADHE 313 (6) Organization of Adult Basic Education Programs</dt>

想抓出ADHE313和Organization of Adult Basic Education Programs

programs

adult

表达式

18 条回复 • 2015-05-29 10:45:43 +08:00

asj

2015-05-28 14:37:22 +08:00

这难道不是应该用CSS/JQuery selector，或者XPath么？

phx13ye

2015-05-28 14:46:16 +08:00

<\/a>(.*)(.*?)<\/b>

sicongliu

2015-05-28 14:48:02 +08:00

xpath比较简单但是想学下正则的方法

shoumu

2015-05-28 14:50:21 +08:00

看一下pyquery吧，支持jQuery的语法

professorz

2015-05-28 14:50:44 +08:00

.+<\\/a>(.+)(6)(.+)<\\/b>.+
java下的regex

sicongliu

2015-05-28 14:53:48 +08:00

python的如何写

yiyiwa

2015-05-28 15:20:58 +08:00

python测试了一下，不完善，有空的东西。

'\>([^\<]*)\<'

sicongliu

2015-05-28 15:42:05 +08:00

m=re.search("</a>(.*?)\s(",text)
print (m.group(1))

m=re.search("(.*?)(",text)
print (m.group(1))

sicongliu

2015-05-28 15:46:12 +08:00

如果要取ADHE 313呢？
如何判断第二个空格？当然用字符串的search切片功能很容易达到，只是想知道正则如何达到

sicongliu

2015-05-28 15:49:25 +08:00

m=re.search("</a>(.*?)\s+\(",text)
print (m.group(1))

当然方法比较笨，如果第二个空格后不是“(”就没办法了

asj

2015-05-28 16:09:03 +08:00

简单写了一个，还很不完善
(?:<dt.*?>)(?:.*?\/.*?>)([\w ]*)(?:.*?)(?:<\/dt>)

http://regexr.com/3b3bs

2015-05-28 16:21:31 +08:00

这个需求不用正则，会简单得多

page.xpath("//dt/text()") -> ADHE 313 (6)
page.xpath("//dt/b/text()") -> Organization of Adult Basic Education Programs

picasso250

2015-05-28 17:13:42 +08:00

/a>([\w ()]+)([\w ]+)
最简单的解决了你现在的问题。

picasso250

2015-05-28 17:16:23 +08:00

对不起，上一个是错误的，多提取了(6)

/a>(\w+ \d+).+?([\w ]+)

leozy2014

2015-05-28 17:37:10 +08:00

print re.findall('</a>(.*?) \(6\) (.*?)</dt>', s)
#[('ADHE 313', 'Organization of Adult Basic Education Programs')]

wmttom

2015-05-28 17:46:15 +08:00

python正则 (?<=>)[\w, ,\(,\)]+?(?= \(|<)

re.findall("(?<=>)[\w, ,\(,\)]+?(?= \(|<)", '<dt><a name="313"></a>ADHE 313 (6) Organization of Adult Basic Education Programs</dt>')

['ADHE 313', 'Organization of Adult Basic Education Programs']

sicongliu

2015-05-29 10:38:54 +08:00

楼上两个貌似都不能用

sicongliu

2015-05-29 10:45:43 +08:00

sorry这个可行

print re.findall('</a>(.*?) \(6\) (.*?)</dt>', s)
#[('ADHE 313', 'Organization of Adult Basic Education Programs')]