I am making a script to refactor template-enhanced html.
I'd like beautifulsoup to print verbatim any code inside {{ }} without transforming > or other symbols into html entities.
It needs to split a file with many templates into many files, each with one template.
Spec: splitTemplates templates.html templateDir must:
- read templates.html
- for each
<template name="xxx">contents {{ > directive }}</template>, write this template to filetemplateDir/xxx
Code snippet:
soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
if t.name=="template":
newfname = dirname+"/"+t["name"]+ext
f = open(newfname,"w")
# note f.write(t) fails as t is not a string -- prettify works
f.write(t.prettify())
f.close()
Undesired behavior:
{{ > directive }} becomes {{ > directive }} but needs to be preserved as {{ > directive }}
Update: f.write(t.prettify(formatter=None)) sometimes preserves the > and sometimes changes it to >. This seems to be the most obvious change, not sure why it changes some > but not others.
Solved: Thanks to Hai Vu and also https://stackoverflow.com/a/663128/103081
import HTMLParser
U = HTMLParser.HTMLParser().unescape
soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
if t.name=="template":
newfname = dirname+"/"+t["name"]+ext
f = open(newfname,"w")
f.write(U(t.prettify(formatter=None)))
f.close()