Getting our backend Go-ing
Our developer Chopper is getting started with Golang to up his backend game and shares his insights. We used Go to great effect on the Myanmar election project.
Since the latter half of last year, my teammates at Wiredcraft have started using tools written with Golang (also known simply as Go) and using the Go directory on projects such as building the voter registration software for the Myanmar elections last year, their first in a quarter century. Check out the tech we used and why we chose to use Golang for this project. As a fan of strongly-typed programming languages, I think it’s certainly time to check out this relatively young one.
Go, like other programming languages, offers many standard libraries to help users handler files by their buffer, position, or line. However, unlike Javascript or Node.js, Go doesn’t have any standard asynchronous input-output (I/O) libs. To get around this obstacle, you can easily use Go’s concurrency to implement non-blocking I/O. As a beginner with this language, I’ll try to list some methods for finding and catching wanted information in files using Go.
Examples
You can explore the demo code in one of my Github repos, query_file_demo.
Ok, let us start with the the simplest way: using the ioutil lib
func SimpleReader(path string) string {
f, err := ioutil.ReadFile(path)
CheckError(err)
lines := strings.Split(string(f), "\n")
re := regexp.MustCompile(`\bslowpoke\b`)
var result string
for _, line := range lines {
if re.MatchString(line) {
result = line
}
}
return result
}
After this, strings.Split() can help to convert loaded content to a string array and regexp.MatchString() will match the regex expression with each element of the array. Be careful: you should not read a file with the ioutil.ReadFile()method if the file is too large to load in memory at once. You can learn why from the Go lib’s source code.
func ReadFile(filename string) ([]byte, error) {
...
return readAll(f, n+bytes.MinRead)
}
Here, n
is the size of the file and you’ll get bytes.ErrTooLarge
if the file overflows the buffer.
Next, you can try to use the bufio lib:
func Scanner(path string) string {
f, err := os.Open(path)
CheckError(err)
defer f.Close()
var result string
scanner := bufio.NewScanner(f)
re := regexp.MustCompile(`\bslowpoke\b`)
for scanner.Scan() {
s := scanner.Text()
if re.MatchString(s) {
result = s
}
}
return result
}
Again, you can read the source code of NewScanner function:
const (
// MaxScanTokenSize is the maximum size used to buffer a token
// unless the user provides an explicit buffer with Scan.Buffer.
// The actual maximum token size may be smaller as the buffer
// may need to include, for instance, a newline.
MaxScanTokenSize = 64 * 1024
startBufSize = 4096 // Size of initial allocation for buffer.
)
func NewScanner(r io.Reader) *Scanner {
return &Scanner{
r: r,
split: ScanLines,
maxTokenSize: MaxScanTokenSize,
}
}
In this snippet, MaxScanTokenSize is the max size for each line, but it is only reading one line at time. Only if you line size is above MaxScanTokenSize, you can read the file as big as you want. If you don’t want the file to be read line by line, you can change to file.Read() and IO.copy(). These APIs allow you to define a buffer size for reading a file.
Still, if you want to use Go concurrency to help you make the job run faster, you can use the Go channel and goroutines.
func ChannelReader(path string) string {
works := 10
f, err := os.Open(path)
CheckError(err)
defer f.Close()
jobs := make(chan string)
results := make(chan string)
complete := make(chan bool)
go func() {
scanner := bufio.NewScanner(f)
for scanner.Scan() {
jobs <- scanner.Text()
}
close(jobs)
}()
for i := 0; i <= works; i++ {
go grepLine(jobs, results, complete)
}
for i := 0; i < works; i++ {
<-complete
}
return <-results
}
func grepLine(jobs <-chan string, results chan<- string, complete chan bool) {
re := regexp.MustCompile(`\bslowpoke\b`)
for j := range jobs {
if re.MatchString(j) {
results <- j
}
}
complete <- true
}
This code creates 10 goroutines for the grep information job. When every goroutine is finished, the complete channel with get all true values, then any blocking code <-complete can pass.
Bechmark
The Go Testing Suite (testing
) not only supports automated testing of Go packages, but it also contains some benchmark tools. All you need to do is write your test case, run the command, turn on the -benchmem flag, and add the result with memory consumption.
go test -bench=. -benchmem
When I ran the test, I used a small file, so the results and bench directory in my repo are merely showing how to benchmark a function. Here are the results I collected from my computing:
testing: warning: no tests to run
PASS
BenchmarkChannelReader-4 grep line: 79,slowpoke,79,12,360,63,98,1
1000 1712159 ns/op 490096 B/op 1149 allocs/op
BenchmarkScanner-4 79,slowpoke,79,12,360,63,98,1
1000 1766001 ns/op 77130 B/op 846 allocs/op
BenchmarkSimpleReader-4 grep line: 79,slowpoke,79,12,360,63,98,1
1000 1791945 ns/op 112651 B/op 37 allocs/op
ok github.com/chopperlee2011/query_file_demo/bench 5.816s
Conclusion
I am glad to know that Go supports a rather convenient and fast lib for doing some concurrency work, which can help developers break through difficult technical bottlenecks when building apps with other languages. On top of that, the performance of Go’s tools and framework, such as Gin and NSQ really impress me.
If you want to dig more into Go, I think Gopher Academy is a great place to start and you can chat with and meet more Go gophers in their Slack channel. The annual conference, Gophercon, took place last month, so look out for updates from that and look for announcements about next year’s.
I hope you enjoy exploring this cool language as much as I do. It really helped us a lot with the Myanmar project. If you have any thoughts to share about Golang or its related tools, send us an email ([email protected]) or ping us on Twitter (@wiredcraft).