這裡,我用DOM進行XML解析,應為它簡單。
1 客戶首先使用VB進行編輯表單,生成一個apply.XML文件。
在VB中,使用MSXML 4.0。如果不設定編碼方式,保存的時候,文件默認就是UTF-8編碼
Set dom = CreateDOM
Set node = dom.createProcessingInstrUCtion("XML", "version='1.0'")
dom.appendChild node
Set node = Nothing
2 接下來,客戶將這個XML通過Web上傳到服務器
在PHP中,XMLDOM只支持UTF-8作為默認編碼。所以生成的XML文件,上傳以後可以直接解析這個文件,獲得一些信息
if (!$dom = domXML_open_mem($content)) {
$t->assign('msg', "文件解析錯誤!");
$t->render('noavailable.Html', PAGE_TITLE, 'wrap.Html');
exit;
}
接下來,要將這個文件存到數據庫裡面,因為數據庫使用MS SQL Server,它不支持UTF-8的數據結構,所以將整個文件以二進制的方式存到數據庫裡面,這裡讓我搞了半天的就是二進制文件的存放方式,如果是MySQL,那不需要做任何轉換就可以直接存了,但是MSSQL不行,原因是:
This is because the MSSQL parser makes a clear distinction between binary an character constants. You can therefore not easilly insert binary data with "column = '$data'" syntax like in MySQL and others.
The MSSQL documentation states that binary constants should be represented by their unquoted hexadecimal byte-string. That is.. to set the binary column "col" to contain the bytes 0x12, 0x65 and 0x35 you shold do "col = 0x126535" in you query.
具體操作如下:
//讀取上傳的文件
$original = $_FILES['content']['name'];
if (!empty($original)) {
if ($_FILES['content']['type'] == "text/XML") {
$filename = $_FILES['content']['tmp_name'];
$handle = fopen($filename, "rb");
$originalcontent = fread($handle, filesize($filename));
fclose($handle);
}
} //end if(!empty($original))
$originalcontent = unpack("H*hex", $originalcontent); //這步是關鍵
$db->query("insert into ".TBL_SB_ONLINE_USER." (sb_id, user_id, username, sbmc, content, created_date) values ("
.$newid.", "
.$u.", "
.$db->quote(stripslashes($name)).", "
.$db->quote(stripslashes($sbmc)).", 0x"
.$originalcontent['hex'].", " //注意這裡,前面有0x
."'$now')");
3 上傳之後,用戶也可以在網上對這個文件進行在線編輯,這時需要將這個文件從數據庫讀出,然後還原成UTF-8編碼,再進行解析。雖然我們上面使用了unpack,但讀出的時候不需要還原。
$sb = $db->getRow('select sbmc, content from '.TBL_SB_ONLINE_USER." where sb_id = $sb_id");
$originalcontent =$sb[content];
if (!$dom = domXML_open_mem($originalcontent)) {
$t->assign('msg', "文件解析錯誤!");
$t->render('noavailable.html', PAGE_TITLE, 'wrap.Html',true);
exit;
}
$context = XPath_new_context($dom);
$xpath = $context->xpath_eval("//material/xm");
$t->assign('xm',iconv("UTF-8","GBK",$xpath->nodeset[0]->get_content()));
讀出的時候,mssql除了用於 SQL Server 的 Microsoft OLE DB 提供程序和 SQL Server ODBC 驅動程序自動將 @@TEXTSIZE 設置為最大值 2 GB。其他的都是4096 (4 KB),所以用PHP訪問時候,務必將下面打開mssql.textlimit = 2147483647
mssql.textsize = 2147483647
4 後台用VB,要解析該函數需要添加以下代碼,用來將byte()轉換成utf-8編碼
Public Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, _
ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Public Const CP_UTF8 = 65001
Public Function UTF8_Decode(bUTF8() As Byte) As String
Dim lRet As Long
Dim lLen As Long
Dim lBufferSize As Long
Dim sBuffer As String
Dim bBuffer() As Byte
lLen = UBound(bUTF8) + 1
If lLen = 0 Then Exit Function
lBufferSize = lLen * 2
sBuffer = String$(lBufferSize, Chr(0))
lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(bUTF8(0)), lLen, StrPtr(sBuffer), lBufferSize)
If lRet <> 0 Then
sBuffer = Left(sBuffer, lRet)
End If
UTF8_Decode = sBuffer
End Function
具體讀數據庫的操作是
Dim varcontent() As Byte
varfilesize = mrc.FIElds("content").ActualSize
varcontent = mrc.FIElds("content").GetChunk(varfilesize)
content = UTF8_Decode(varcontent)
XMLDoc.async = False
XMLDoc.resolveExternals = False
xmlDoc.loadXML (content)
If (XMLDoc.parseError.errorCode <> 0) Then
Dim myErr
Set myErr = XMLDoc.parseError
MsgBox ("發生錯誤 " & myErr.reason)
Else
XMLDoc.setProperty "SelectionLanguage", "XPath"
5 後台,在Java裡面就更好操作了,將讀出的數據變成byte[],然後轉換成UTF-8的字符串。
最後要說的是,PHP的確是一個非常強大的腳本語言,如果開發PHP過程中遇到難以解決,Google都不容易搜到的問題,大家直接上PHP.Net的在線文檔,文檔裡面通常有很多好心人將自己的使用心得寫在上面,非常有幫助。