Redis use rdb file to make data persistent. Yesterday, I used the redis-rdb-tools for dumping data from rdb file to JSON format. After that, I write scala code to read data from JSON file and put it into Redis.
Firstly, I found out that the JSON file is almost 50% bigger than rdb file. After checking the whole JSON file, I make sure that the root cause is not the redundant symbols in JSON such as braces and brackets but the “Unicode transformation” in JSON format, especially the “\u0001” for ASCII of “0x01”. Therefore I write code to replace it:
val key: String = line.stripPrefix("\"").stripSuffix("\"").replaceAll("\\\\u0001", "^A")
Then the size of JSON file became normal.
There was still another problem. To read the JSON file line by line, I use code from http://naildrivin5.com/blog/2010/01/26/reading-a-file-in-scala-ruby-java.html:
import scala.io._
object ReadFile extends Application {
val s = Source.fromFile("some_file.txt")
s.getLines.foreach( (line) => {
println(line.trim.toUpperCase)
})
}
But this code will load all data from file and then run “foreach”. As my file is bigger than 100GB, it will cost too much time and RAM ….
The best way is to use IOStream in java:
val dataFile = new File("my.txt")
var reader = new BufferedReader(new FileReader(dataFile))
var line: String = reader.readLine()
while (line != null) {
// do something
line = reader.readLine()
}
This is exactly read file “line by line”.