Reply to thread

Message: <blockquote data-quote="sp3co" data-source="post: 22710416" data-attributes="member: 559702">Python වැඩ්ඩොන්ගෙන් උදව්වක් ඕනේ .. UCSC Sinhala Corpus + NLTK project කරපු කවුද ඉන්නේ ? මම UCSC corpus එක use කරනවා project එකකට. එකේ NLTK වලින් custom corpus open කරන විදියට මේකත් open කරන්න බැලුවේ (NLTK functions use කරන්න නිසා) එත් unicode error එකක් එනවා මේ විදියට Traceback (most recent call last):  File "/home/xxxx/PycharmProjects/testing/readcorpus.py", line 13, in <module>    file = read_file.read()  File "/home/xxxx/.virtualenvs/PycharmProjects/lib/python3.5/codecs.py", line 321, in decode    (result, consumed) = self._buffer_decode(data, self.errors, final)UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byteමගේ code එක මේක ... මේකේ readpath එකත් හරි ..ඒත් ඒ file එක read කරන්න යද්දී තමයි error එක එන්නේ  from nltk.corpus import PlaintextCorpusReadercorpus_root = './resources/corpus/UCSC-Sinhala-News-Corpus/UCSC-Sinhala-News-Corpus-V1'sinhala_corpus = PlaintextCorpusReader(corpus_root, '.*')print(sinhala_corpus.fileids())readpath = './resources/corpus/UCSC-Sinhala-News-Corpus/UCSC-Sinhala-News-Corpus-V1/News Corpus_V1/NPED0001.TXT'read_file = open(readpath, 'r', encoding='utf-8')file = read_file.read()මෙහෙම වෙන්නේ ඇයි මේක හදාගන්නේ කොහොමද ?ඔය text file එක notepad open කලාම type එක තියෙන්නේ unicode.. හැබැයි එක utf-8 කියල save කලාම error එක එන්නේ නෑ file එක read වෙනවා.. එහෙම file type වෙනස් නොකර මේක හදාගන්නේ කොහොමද ?Stackoverflow එකේ බැලුව unicode file read කරන්නේ කොහොමද කියල.. ඒකෙ තියෙන්නෙ ඔය විදියට කලාම හරි කියලා</blockquote>

Verification: Nawa warak dahaya keeyada? (Namaya wadi kireema dahaya)

Top Bottom

Search

Latest ads

Reply to thread